Announcement

**Tom Cone Jr** · 11-22-2006, 02:37 PM

Re: Can this routine be made quicker!

two quick comments:

1) this code would be much much easier to read if it were formatted. The message board removes all formatting unless the block of code is delimited with CODE delimiters, available in the advanced message reply editor

2) ideas we come up with could be tested and validated before replying if specimen tables were supplied, with their indexes.

-- tom

**csda1** · 11-22-2006, 02:44 PM

Re: Can this routine be made quicker!

Chris,

Yes it can be made faster. Currently, you are doing the match table # of records times the process table # of record fetches. Instead, you need to create an index in one of the tables (the larger one probably is best), then use Key_Exist() and LQO queries to quickly return the records to process in the inner WHILE.

There also might be ways to use the Summarize, Intersect and/or Crosstab operations to effectively create a table indicating those that match, and that would probably be even quicker.

**Lenny Forziati** · 11-22-2006, 02:47 PM

Re: Can this routine be made quicker!

There's also the Subtract Records operation.

**Tom Cone Jr** · 11-22-2006, 02:50 PM

Re: Can this routine be made quicker!

Chris,

is there any possibility that more than one record in the process table will match the same match file record? If not, your script could exit the inner loop whenever a match is found. There'd be no reason to check the rest of the process table records if you see what I mean.

-- tom

**Steve Workings** · 11-22-2006, 02:55 PM

Re: Can this routine be made quicker!

I dunno if Ira's suggestion will work. I started to write the same response, but realized that the $ function that you need to make this work isn't compatible with LQO and indexing methods. Lenny and Tom make suggestions that seem worth trying.

**Chris.Tanti** · 11-22-2006, 03:16 PM

Re: Can this routine be made quicker!

Hi everyone,

Not sure if an index will help, the routine looks for a string in the long address line field. The match could appear anywhere within this field, and its usaually based on a partial string match.
eg. the process file could have TOWN 1,CF11 1TH and the Match file will have TOWN 1 , CF11 1 and consider that a close enough match.
These two string may or may not appear together.

And the match file could refer to many (all or any) of the records in the sample process file, its even possible that the match record could match several times to the same record.

Just had a look at subtract and I think it works with matching keys, so once again I dont think it comes in to play when hunting for partial strings within a field

It takes a heck of a long time to crunch through records, and I remember we had an old DOS DBASE5 app that did a similar job way quicker (but not terribly flexible or friendly to use)

Have attached a sample DB.

**Tom Cone Jr** · 11-22-2006, 04:40 PM

Re: Can this routine be made quicker!

Chris,

If your match record could match several process table records then your counting system is going to be off. The same match table record would be changed each time a qualifying process table record is found. Is this significant to your process, or do you really only need to know if AT LEAST one match is found?

Later:

I took a look. Cannot find your script. Please describe the results you need to achieve as specifically as you can. (a) I've been assuming records "match" when they match exactly. It now appears you have something else in mind. (b) of what significance is the running count of matched records (c) how should multiple "matches" be handled?

-- tom

**Chris.Tanti** · 11-22-2006, 04:55 PM

Re: Can this routine be made quicker!

Hi Tom,

Yep, just an example, its basically down to the person running the process to create a good match file with no overlaps (if thats what they want), there are a few occasions when we may want to have one match take priority over another and so they may key them in order of priority in those cases (if theres the posibility of a multi match).

**Chris.Tanti** · 11-22-2006, 04:59 PM

Re: Can this routine be made quicker!

Sorry, about the zip, you need to add a table called tbl_menu, the code is on a form called menu!! Don't know why the table just got dropped (it should be in the original zip)!

Have re-attached a hopefully corect version of the zip file with the form visible

**Steve Workings** · 11-22-2006, 05:55 PM

Re: Can this routine be made quicker!

If a record is found that matched, can it be excluded from further consideration?

If so, add a flag field to the table-- let's assume it's a logical field named "Found".

Set up an index, but filtered for "found = .f."

Do your searching through all records in the index, which will initially be all records in the table.

For each match that is found, change Found to .t.

Continue subsequent searches in the filtered index, which will mean fewer and fewer records searched as the process continues.

There's a bit of overhead involved to change the value of Found when a match is located, but I think it would pay off quite well.

**csda1** · 11-22-2006, 06:19 PM

Re: Can this routine be made quicker!

Chris,

Whether or not an LQO is possible, nevertheless, rather than testing each record in the inner loop, evaluate the outerloop's table values for comparison, then do a query of the field's in the inner loop. The query will go much faster than record by record fetching. After the query, process the record's found (with a global update which is faster than individual changes of records (at least for query count's probably bigger than 3), then move to the next record in the outer loop. It will still take time, but should be faster by, I'll guess a factor of 100 or more.

**Tom Cone Jr** · 11-22-2006, 08:45 PM

Re: Can this routine be made quicker!

Chris,

If one assumes the format of the data will remain unchanged it's possible to custom craft a very fast routine. Using your sample define a new index called "byTownPostal" using this expression:

Code:

REMSPECIAL(WORD(ADDRESS,4," ",3))

. Build the index.

Then run this code to find the matches:

Code:

t1=toseconds(time())

m_tbl=table.open("match",FILE_RW_EXCLUSIVE)
total_records=m_tbl.records_get()
update.fields = 1
update.field1 = "match_cnt"
update.expr1 = "0"
m_tbl.update()

'xbasic_wait_for_idle()
p_tbl=table.open("process_file",FILE_RW_EXCLUSIVE)
p_tbl.index_primary_put("ByTownPostal")

m_tbl.fetch_first()
p_tbl.fetch_first()

p_tbl.batch_begin()

While .not. m_tbl.fetch_eof()
	'load keys once
	'vc_m1 = trim(Match->match1)
	'vc_m2 = trim(Match->match2)	
	srchKey = remspecial(alltrim(Match->match1))+alltrim(match->match2)
	'trace.writeln(srchKey)
	result = P_tbl.fetch_find(srchKey)
	if result > 0 then 'find successful
		matchcounter = 0
		While (srchKey = remspecial(word(p_tbl.address,4," ",3))) .and. .not. p_tbl.fetch_eof()
			p_tbl.change_begin()
			p_tbl.area =m_tbl.area
			p_tbl.area_name =m_tbl.area_name
			p_tbl.std =m_tbl.std
			p_tbl.change_end(.t.)
			matchcounter = matchcounter + 1
			p_tbl.fetch_next()
		End While
		m_tbl.change_begin()
		m_tbl.match_cnt = matchcounter   'write last counter value, instead of each counter value
		m_tbl.change_end(.t.)		
	end if	
	m_tbl.fetch_next()
	p_tbl.fetch_first()
	statusbar.percent(m_tbl.recno(),total_records)
End While
p_tbl.batch_end()

p_tbl.close()
m_tbl.close()

t2=toseconds(time())
XMsg=ui_msg_Box("","Process Finished"+chr(10)+ltrim(str(t2-t1))+" seconds",ui_ok)

**Chris.Tanti** · 11-23-2006, 06:56 AM

Re: Can this routine be made quicker!

Steve / Ira / Tom,

Thanks a lot guys, I am going to try all your suggestions, I think it would be a good idea to filter out the matched records as suggested, occurences of match files with possible overlaps is extremely rare.

I have tried Tom version of the code, and its incredibly quick (I initially thought it didnt work!) Only prob is that they key values may not always be in that sequence, they could literally be matching on any part of the string (there could be sub areas, or even partial street names). If I can find a situation where the match is based on a fixed format then I think this is an incredibly efficient way of handling the matching. As it stands I think I will first try filtering out the matched records to reduce the sample size.

**Tom Cone Jr** · 11-23-2006, 07:10 AM

Re: Can this routine be made quicker!

they could literally be matching on any part of the string (there could be sub areas, or even partial street names)

This wasn't clear to me from the top. The approach I built will work only if the search key and data format remain fixed.

What you're seeing is the same difference which existed back in DOS dBase between a FIND (using an index) and a LOCATE. The indexed FIND approach was always much faster, but required rigid structure. The LOCATE approach requires no structure but is much slower.

Perhaps if you tell us more about the application, and describe the permitted ways users can populate the match table, maybe we can suggest other ideas for you. It would help to know if the structure and content of the Process_file table is fixed or not, too.

-- tom

Announcement

The Alpha Software Forum Participation Guidelines

Can this routine be made quicker!

Can this routine be made quicker!

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment