Greetings fellow developers !
I have a task to accomplish and I thought I would bounce it off this list and see if someone had a better approach to the situation. I have
V11 available to work with.
I am working on a specialized auditing project and one of the tasks is to identify "government" customers that this client sells to. Normally,
this is not too tough as the customer file may be coded (flagged) with a customer type field denoting "government" accounts and our task
is to browse thru the list of those accounts to verify that they are in fact "government" accounts and look for others in the remaining potion
of the customer file that have not been so noted. Usually this is anywhere from a few hundred to ten thousand and can easily be done in
a tool such as Excel. But not this project !!
The Customer File is in excess of 1 Million customers and over 1 gig of data for that file. Also there is not a customer type field with a value
for government account, either.
So the approach I am considered is the following....
-I have loaded the entire Customer File into A5.
-I have built another table containing typical government entity naming (words) such as:
City, County, Parish, University, AAFES, Army, Navy... etc.
Now I was thinking about bouncing the "term list" against the names in the Customer Table and "Marking" those records.
I will still have to scroll thru and determine whether the "marked" records and in fact potential accounts... an example of what
I will have to deal with is: City of Chicago... Chicago, City of... Fun City... the later is most likely NOT a government
account even though it contains the word "city" in it.
Browsing thru the "marked" records and "un-marking" items that are not in all likelihood government accounts, should leave
me with a "marked" list that I could either export OR add a coding field too for future reference.
Since the "term list" is about 40 items and the customer file is in excess of 1 million, I expect that this process will take a while to
run in the first place... and hopefully not select an excessive amount of possible matches (fingers crossed here).
Anyone have what they think might be a better approach that they would be willing to share?
Regards,
Keith
I have a task to accomplish and I thought I would bounce it off this list and see if someone had a better approach to the situation. I have
V11 available to work with.
I am working on a specialized auditing project and one of the tasks is to identify "government" customers that this client sells to. Normally,
this is not too tough as the customer file may be coded (flagged) with a customer type field denoting "government" accounts and our task
is to browse thru the list of those accounts to verify that they are in fact "government" accounts and look for others in the remaining potion
of the customer file that have not been so noted. Usually this is anywhere from a few hundred to ten thousand and can easily be done in
a tool such as Excel. But not this project !!
The Customer File is in excess of 1 Million customers and over 1 gig of data for that file. Also there is not a customer type field with a value
for government account, either.
So the approach I am considered is the following....
-I have loaded the entire Customer File into A5.
-I have built another table containing typical government entity naming (words) such as:
City, County, Parish, University, AAFES, Army, Navy... etc.
Now I was thinking about bouncing the "term list" against the names in the Customer Table and "Marking" those records.
I will still have to scroll thru and determine whether the "marked" records and in fact potential accounts... an example of what
I will have to deal with is: City of Chicago... Chicago, City of... Fun City... the later is most likely NOT a government
account even though it contains the word "city" in it.
Browsing thru the "marked" records and "un-marking" items that are not in all likelihood government accounts, should leave
me with a "marked" list that I could either export OR add a coding field too for future reference.
Since the "term list" is about 40 items and the customer file is in excess of 1 million, I expect that this process will take a while to
run in the first place... and hopefully not select an excessive amount of possible matches (fingers crossed here).
Anyone have what they think might be a better approach that they would be willing to share?
Regards,
Keith
Comment