Announcement

**whanigsberg** · 06-27-2002, 05:37 PM

RE: Speed of loading a very large database

I don't know if this gets you anywhere but Alpha tables are foxpro 2.6.

Bill

**CraigSchumacker** · 06-27-2002, 06:13 PM

RE: Speed of loading a very large database

Richard:

You sound quite knowledgeable. However, 20,000 record imports should take a few minutes, not hours or days.

It's hard for us to establish much from your description, but if you were to post your actual procedure, and a sample of your database, we might be able to refine your steps.

If you cannot post a sample, try backing up your entire application, then breaking it down piece by piece to determine where the bottleneck is.

I'd look very closely at the field rules. As the table appends or imports data, the field rules being enforced can slow the process. Rules that use lookupc variants are notorious for slowing imports and appends.

I'd also consider removing the duplicates after the records have been imported, if this is a possibility.

Also, how many indexes are associated with each table, and, what evaluations are involved.

For a quick before/after evaluation, after backing up, remove all field rules, indexes, and the duplicate filtering, and determine what your target time is for each import. At least you'll determine what times are possible.

Sorry this is as weak an answer as it is, but without a sample of things like "weirdly formatted text" files, it's hard to draw any conclusions!

Craig

**Peter.Wayne** · 06-27-2002, 06:13 PM

RE: Speed of loading a very large database

i realize that you write that you have optimized the xbasic as much as possible, that seems odd to me, since you are only reading a 5 to 10 Meg input file. If you have an average record size of, say, 500 characters, then with a 5 Meg input file that would come to 10,000 records. I can't see how it could day a day to read and process 10,000 records from a text file. Are you, for example, repeatedly opening and closing files? How many different record types are you reading in?

**Eric Ruff** · 06-27-2002, 06:53 PM

RE: Speed of loading a very large database

Has this behaviour been constant, or has it slowed down recently? Assuming the database is on a server, have you tested your connection speed? Have you copied the data to a local machine and compared the time to process locally?

From your comments, I assume you are a 'C' programmer, so here are some low-level insights for you that could make a dramatic difference, even after you find and correct other issues that are blocking the speed.

I suggest you write a 'C' program to do the front-end work allowing Alpha 5 do a quicker import. A decent 'C' program should be able to process 10MB of data, converting your "weirdly formatted text input" into a standard comma-delimited "Alpha-friendly" text file in a matter of seconds, possibly minutes (seconds if you do your own disk buffering with read/write buffers at least 32k in size, but much longer with normal 'C' runtime libraries that read/write one record at a time - I speak from much experience here!).

Then, use Alpha 5 to import the tables and avoid Xbasic manipulations, which would be much slower than 'C' code. Also note that any active indexes will slow things down when appending, as each index must be updated one by one; rebuilding an index is many times (hundreds??) faster than inserting records one at a time in multiple indexes.

So, if your logic allows, drop all indexes for the files you are appending to, do the import, then rebuild the indexes when done. Doing this will dramatically reduce disk seeks, giving you at least one, maybe more, orders of magnitude speed improvement.

Eric

**Eric Ruff** · 06-27-2002, 06:59 PM

RE: Speed of loading a very large database

One more thing: filtering out the duplicates would normally require an active index, but again, try to do the main "dirty work" of text manipulation in 'C', then import the data and index it. Then, finally, delete duplicates as the last step.

Eric

**Robert Adler** · 06-28-2002, 06:41 AM

RE: Speed of loading a very large database

Richard:

I import files ranging from 500 to 1,500,000 records on a regular basis.

I just ran a time test on a 700,000 record asc file. Total time to import 1 minute and 20 seconds.

As suggested by others we would have to see the code you are using.

I hope this helps.

bob adler

**richarddsmith** · 06-28-2002, 11:41 AM

RE: Speed of loading a very large database

I turned off all indexing and cleared the tables, and the thing flies! The speed problem was definately related to indexing. In hind-sight, considering that each table had about 3 or 4 indeces that had to be regenerated after each entry, this isn't too surprising. My problem now is that some of the "new" incoming data is supposed to be updates to the existing data, and the query will probably regenerate the required indeces on each table, right? If this is true, how do I get around the speed problem and still be able to find and update existing records?

**Eric Ruff** · 06-28-2002, 11:59 AM

RE: Speed of loading a very large database

You might want to try segregating the update records from the records that are to be appended. BTW, just how much faster does your process operate with indexes turned off? I'm interested in knowing.

Eric

**richarddsmith** · 06-28-2002, 12:39 PM

RE: Speed of loading a very large database

I processed 15,000 records in 31 and a fraction minutes without any indexing. Yesterday I processes 13,100 (exact same file, but I aborted it before finishing) in about 6 hours. By turning off ALL indexing I improved the speed by a factor of 13.9, or an improvement of 1,390% for speed. Obviously I'm going to have to give some back, but segregating the "new" records in another table, then generating the indeces for each table only once and querying the old table with batching should still be much faster. I anticipate giving back only about 3% to 5%, which still leaves me 10X faster than before. WHAT A DIFFERANCE!

**Eric Ruff** · 06-28-2002, 01:48 PM

RE: Speed of loading a very large database

Richard,

That speed improvement is great! After you segregate your processes, you may still want to shave some time off; 31 minutes still seems to me WAY too long. There's almost always a way to improve, depending on how important speed is to you. Personally, I'm a speed freak when it comes to computer processing. I've seen processes drop from 4 hours down to under 3 minutes with a bit of rethinking, redesigning and tweaking.

I'd be interested in the difference between what you are doing and what Robert Adler was doing (to import 700,000 records in 1.5 minutes). Something is apparently still slowing you way down. I'm rather new to A5 with very little time now to experiment, but in the future I'll play with varying options to determine faster ways to process huge amounts of data. If anyone else has done performance testing, I'd love to know what they found out and where they left off.

Eric

**richarddsmith** · 06-28-2002, 03:48 PM

RE: Speed of loading a very large database

It's the complex text parsing and translations of the input data into multiple tables that takes the time. The speed is only twice as long as a somewhat similar application I did using ANSI-C (for Windows) going into a Microsoft Access database. In that application I processed records twice as fast, while using an SQL wrapper to get into the Access database. Since the application will run on full autopilot in the middle of the night, the time it would take to optimize any further is not cost effective. If I can keep it to 3/4 of an hour or less, it will work fine for the application. Someday when I have time (probably after I retire) I may optomize it just to see how fast I can make it run. :)

**richarddsmith** · 06-29-2002, 11:44 AM

RE: Speed of loading a very large database

After 2 days of playing (at 16 hrs per day), and a small architecturally valid assumption change, I was able to process 43,675 records in under 10 minutes (machine & hard drive dependant). The resultant tables are exactly the same as the 6 hour run for the partial data processed. The key is elimonating ALL indeces (so they don't have to be regenerated after each entry), and then fixing duplicates and regenerating the indeces after all data has been processes. I've found several new ways to totally puke the computer OS (which is not unusual for this type of code development), but I'm starting to like this database more every day.

**richarddsmith** · 06-29-2002, 11:45 AM

RE: Speed of loading a very large database

After 2 days of playing (at 16 hrs per day), and a small architecturally valid assumption change, I was able to process 43,675 records in under 10 minutes (machine & hard drive dependant). The resultant tables are exactly the same as the 6 hour run for the partial data processed. The key is elimonating ALL indeces (so they don't have to be regenerated after each entry), and then fixing duplicates and regenerating the indeces after all data has been processes. I've found several new ways to totally puke the computer OS (which is not unusual for this type of code development), but I'm starting to like this database more every day.

**richarddsmith** · 06-29-2002, 11:45 AM

RE: Speed of loading a very large database

After 2 days of playing (at 16 hrs per day), and a small architecturally valid assumption change, I was able to process 43,675 records in under 10 minutes (machine & hard drive dependant). The resultant tables are exactly the same as the 6 hour run for the partial data processed. The key is elimonating ALL indeces (so they don't have to be regenerated after each entry), and then fixing duplicates and regenerating the indeces after all data has been processes. I've found several new ways to totally puke the computer OS (which is not unusual for this type of code development), but I'm starting to like this database more every day.

Announcement

The Alpha Software Forum Participation Guidelines

Speed of loading a very large database

Speed of loading a very large database

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment