Alpha Software Mobile Development Tools:   Alpha Anywhere    |   Alpha TransForm subscribe to our YouTube Channel  Follow Us on LinkedIn  Follow Us on Twitter  Follow Us on Facebook

Announcement

Collapse

The Alpha Software Forum Participation Guidelines

The Alpha Software Forum is a free forum created for Alpha Software Developer Community to ask for help, exchange ideas, and share solutions. Alpha Software strives to create an environment where all members of the community can feel safe to participate. In order to ensure the Alpha Software Forum is a place where all feel welcome, forum participants are expected to behave as follows:
  • Be professional in your conduct
  • Be kind to others
  • Be constructive when giving feedback
  • Be open to new ideas and suggestions
  • Stay on topic


Be sure all comments and threads you post are respectful. Posts that contain any of the following content will be considered a violation of your agreement as a member of the Alpha Software Forum Community and will be moderated:
  • Spam.
  • Vulgar language.
  • Quotes from private conversations without permission, including pricing and other sales related discussions.
  • Personal attacks, insults, or subtle put-downs.
  • Harassment, bullying, threatening, mocking, shaming, or deriding anyone.
  • Sexist, racist, homophobic, transphobic, ableist, or otherwise discriminatory jokes and language.
  • Sexually explicit or violent material, links, or language.
  • Pirated, hacked, or copyright-infringing material.
  • Encouraging of others to engage in the above behaviors.


If a thread or post is found to contain any of the content outlined above, a moderator may choose to take one of the following actions:
  • Remove the Post or Thread - the content is removed from the forum.
  • Place the User in Moderation - all posts and new threads must be approved by a moderator before they are posted.
  • Temporarily Ban the User - user is banned from forum for a period of time.
  • Permanently Ban the User - user is permanently banned from the forum.


Moderators may also rename posts and threads if they are too generic or do not property reflect the content.

Moderators may move threads if they have been posted in the incorrect forum.

Threads/Posts questioning specific moderator decisions or actions (such as "why was a user banned?") are not allowed and will be removed.

The owners of Alpha Software Corporation (Forum Owner) reserve the right to remove, edit, move, or close any thread for any reason; or ban any forum member without notice, reason, or explanation.

Community members are encouraged to click the "Report Post" icon in the lower left of a given post if they feel the post is in violation of the rules. This will alert the Moderators to take a look.

Alpha Software Corporation may amend the guidelines from time to time and may also vary the procedures it sets out where appropriate in a particular case. Your agreement to comply with the guidelines will be deemed agreement to any changes to it.



Bonus TIPS for Successful Posting

Try a Search First
It is highly recommended that a Search be done on your topic before posting, as many questions have been answered in prior posts. As with any search engine, the shorter the search term, the more "hits" will be returned, but the more specific the search term is, the greater the relevance of those "hits". Searching for "table" might well return every message on the board while "tablesum" would greatly restrict the number of messages returned.

When you do post
First, make sure you are posting your question in the correct forum. For example, if you post an issue regarding Desktop applications on the Mobile & Browser Applications board , not only will your question not be seen by the appropriate audience, it may also be removed or relocated.

The more detail you provide about your problem or question, the more likely someone is to understand your request and be able to help. A sample database with a minimum of records (and its support files, zipped together) will make it much easier to diagnose issues with your application. Screen shots of error messages are especially helpful.

When explaining how to reproduce your problem, please be as detailed as possible. Describe every step, click-by-click and keypress-by-keypress. Otherwise when others try to duplicate your problem, they may do something slightly different and end up with different results.

A note about attachments
You may only attach one file to each message. Attachment file size is limited to 2MB. If you need to include several files, you may do so by zipping them into a single archive.

If you forgot to attach your files to your post, please do NOT create a new thread. Instead, reply to your original message and attach the file there.

When attaching screen shots, it is best to attach an image file (.BMP, .JPG, .GIF, .PNG, etc.) or a zip file of several images, as opposed to a Word document containing the screen shots. Because Word documents are prone to viruses, many message board users will not open your Word file, therefore limiting their ability to help you.

Similarly, if you are uploading a zipped archive, you should simply create a .ZIP file and not a self-extracting .EXE as many users will not run your EXE file.
See more
See less

Speed of loading a very large database

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Speed of loading a very large database

    I have a large database (10,000 to 20,000 records per table with many tables) that is loaded from text files. Once the data is in the database, the reports, etc., work fine. I have written an Xbasic program that parses out the information into the database tables, and it works fine. I have also optomised it as much as I can. The problem is that it takes HOURS (or more than a DAY in some cases) to run through the 5 to 10 meg weirdly formatted text input file, while verifying that no duplicates were created. This is unacceptable. If I knew the binary format of the database I could do this in ANSI C (much faster than C++ when done properly) and it would take only minutes (or a couple of hours at most), not hours to days. How do I get the exact binary format of each table so I can do this? I am perfectly willing to sign a non-disclosure &/or non-compete agreement.

    #2
    RE: Speed of loading a very large database

    I don't know if this gets you anywhere but Alpha tables are foxpro 2.6.

    Bill
    Bill Hanigsberg

    Comment


      #3
      RE: Speed of loading a very large database

      Richard:

      You sound quite knowledgeable. However, 20,000 record imports should take a few minutes, not hours or days.

      It's hard for us to establish much from your description, but if you were to post your actual procedure, and a sample of your database, we might be able to refine your steps.

      If you cannot post a sample, try backing up your entire application, then breaking it down piece by piece to determine where the bottleneck is.

      I'd look very closely at the field rules. As the table appends or imports data, the field rules being enforced can slow the process. Rules that use lookupc variants are notorious for slowing imports and appends.

      I'd also consider removing the duplicates after the records have been imported, if this is a possibility.

      Also, how many indexes are associated with each table, and, what evaluations are involved.

      For a quick before/after evaluation, after backing up, remove all field rules, indexes, and the duplicate filtering, and determine what your target time is for each import. At least you'll determine what times are possible.

      Sorry this is as weak an answer as it is, but without a sample of things like "weirdly formatted text" files, it's hard to draw any conclusions!

      Craig

      Comment


        #4
        RE: Speed of loading a very large database

        i realize that you write that you have optimized the xbasic as much as possible, that seems odd to me, since you are only reading a 5 to 10 Meg input file. If you have an average record size of, say, 500 characters, then with a 5 Meg input file that would come to 10,000 records. I can't see how it could day a day to read and process 10,000 records from a text file. Are you, for example, repeatedly opening and closing files? How many different record types are you reading in?

        Comment


          #5
          RE: Speed of loading a very large database

          Has this behaviour been constant, or has it slowed down recently? Assuming the database is on a server, have you tested your connection speed? Have you copied the data to a local machine and compared the time to process locally?

          From your comments, I assume you are a 'C' programmer, so here are some low-level insights for you that could make a dramatic difference, even after you find and correct other issues that are blocking the speed.

          I suggest you write a 'C' program to do the front-end work allowing Alpha 5 do a quicker import. A decent 'C' program should be able to process 10MB of data, converting your "weirdly formatted text input" into a standard comma-delimited "Alpha-friendly" text file in a matter of seconds, possibly minutes (seconds if you do your own disk buffering with read/write buffers at least 32k in size, but much longer with normal 'C' runtime libraries that read/write one record at a time - I speak from much experience here!).

          Then, use Alpha 5 to import the tables and avoid Xbasic manipulations, which would be much slower than 'C' code. Also note that any active indexes will slow things down when appending, as each index must be updated one by one; rebuilding an index is many times (hundreds??) faster than inserting records one at a time in multiple indexes.

          So, if your logic allows, drop all indexes for the files you are appending to, do the import, then rebuild the indexes when done. Doing this will dramatically reduce disk seeks, giving you at least one, maybe more, orders of magnitude speed improvement.

          Eric

          Comment


            #6
            RE: Speed of loading a very large database

            One more thing: filtering out the duplicates would normally require an active index, but again, try to do the main "dirty work" of text manipulation in 'C', then import the data and index it. Then, finally, delete duplicates as the last step.

            Eric

            Comment


              #7
              RE: Speed of loading a very large database

              Richard:

              I import files ranging from 500 to 1,500,000 records on a regular basis.

              I just ran a time test on a 700,000 record asc file. Total time to import 1 minute and 20 seconds.

              As suggested by others we would have to see the code you are using.

              I hope this helps.

              bob adler

              Comment


                #8
                RE: Speed of loading a very large database

                I turned off all indexing and cleared the tables, and the thing flies! The speed problem was definately related to indexing. In hind-sight, considering that each table had about 3 or 4 indeces that had to be regenerated after each entry, this isn't too surprising. My problem now is that some of the "new" incoming data is supposed to be updates to the existing data, and the query will probably regenerate the required indeces on each table, right? If this is true, how do I get around the speed problem and still be able to find and update existing records?

                Comment


                  #9
                  RE: Speed of loading a very large database

                  You might want to try segregating the update records from the records that are to be appended. BTW, just how much faster does your process operate with indexes turned off? I'm interested in knowing.

                  Eric

                  Comment


                    #10
                    RE: Speed of loading a very large database

                    I processed 15,000 records in 31 and a fraction minutes without any indexing. Yesterday I processes 13,100 (exact same file, but I aborted it before finishing) in about 6 hours. By turning off ALL indexing I improved the speed by a factor of 13.9, or an improvement of 1,390% for speed. Obviously I'm going to have to give some back, but segregating the "new" records in another table, then generating the indeces for each table only once and querying the old table with batching should still be much faster. I anticipate giving back only about 3% to 5%, which still leaves me 10X faster than before. WHAT A DIFFERANCE!

                    Comment


                      #11
                      RE: Speed of loading a very large database

                      Richard,

                      That speed improvement is great! After you segregate your processes, you may still want to shave some time off; 31 minutes still seems to me WAY too long. There's almost always a way to improve, depending on how important speed is to you. Personally, I'm a speed freak when it comes to computer processing. I've seen processes drop from 4 hours down to under 3 minutes with a bit of rethinking, redesigning and tweaking.

                      I'd be interested in the difference between what you are doing and what Robert Adler was doing (to import 700,000 records in 1.5 minutes). Something is apparently still slowing you way down. I'm rather new to A5 with very little time now to experiment, but in the future I'll play with varying options to determine faster ways to process huge amounts of data. If anyone else has done performance testing, I'd love to know what they found out and where they left off.

                      Eric

                      Comment


                        #12
                        RE: Speed of loading a very large database

                        It's the complex text parsing and translations of the input data into multiple tables that takes the time. The speed is only twice as long as a somewhat similar application I did using ANSI-C (for Windows) going into a Microsoft Access database. In that application I processed records twice as fast, while using an SQL wrapper to get into the Access database. Since the application will run on full autopilot in the middle of the night, the time it would take to optimize any further is not cost effective. If I can keep it to 3/4 of an hour or less, it will work fine for the application. Someday when I have time (probably after I retire) I may optomize it just to see how fast I can make it run. :)

                        Comment


                          #13
                          RE: Speed of loading a very large database

                          After 2 days of playing (at 16 hrs per day), and a small architecturally valid assumption change, I was able to process 43,675 records in under 10 minutes (machine & hard drive dependant). The resultant tables are exactly the same as the 6 hour run for the partial data processed. The key is elimonating ALL indeces (so they don't have to be regenerated after each entry), and then fixing duplicates and regenerating the indeces after all data has been processes. I've found several new ways to totally puke the computer OS (which is not unusual for this type of code development), but I'm starting to like this database more every day.

                          Comment


                            #14
                            RE: Speed of loading a very large database

                            After 2 days of playing (at 16 hrs per day), and a small architecturally valid assumption change, I was able to process 43,675 records in under 10 minutes (machine & hard drive dependant). The resultant tables are exactly the same as the 6 hour run for the partial data processed. The key is elimonating ALL indeces (so they don't have to be regenerated after each entry), and then fixing duplicates and regenerating the indeces after all data has been processes. I've found several new ways to totally puke the computer OS (which is not unusual for this type of code development), but I'm starting to like this database more every day.

                            Comment


                              #15
                              RE: Speed of loading a very large database

                              After 2 days of playing (at 16 hrs per day), and a small architecturally valid assumption change, I was able to process 43,675 records in under 10 minutes (machine & hard drive dependant). The resultant tables are exactly the same as the 6 hour run for the partial data processed. The key is elimonating ALL indeces (so they don't have to be regenerated after each entry), and then fixing duplicates and regenerating the indeces after all data has been processes. I've found several new ways to totally puke the computer OS (which is not unusual for this type of code development), but I'm starting to like this database more every day.

                              Comment

                              Working...
                              X