PDA

View Full Version : Removing characters from text field


ABC123

John Cunradi
02-04-2015, 05:44 PM
I am importing a text file into a table.
The trouble is that the text file has the single-quote character at the beginning and ending of some fields.
Sometimes it's on a character-field or sometimes a number-field but it doesn't apply using any logic that makes sense to me.

How do I remove the single-quote character from either the original text file or from the table created when I import that text file ?
I can personally use Notepad to clean up the mess but I need something automated so my customers can do it.

Thanks for the help.

DaveM
02-04-2015, 06:02 PM
remspecial() comes to mind.

If using notepad, replace works like " with nothing

John Cunradi
02-04-2015, 06:20 PM
That did the trick. Thanks Dave.

Al Buchholz
02-04-2015, 07:03 PM
That did the trick. Thanks Dave.

Did you use remspecial()?

It's a great function, but may remove more than you want. Especially in a comma delimited file...

Another option is strtran() or it's other variations.

Glad to hear it's working for you.

MoGrace
02-05-2015, 11:08 AM
Hi Al,
Quick question:
In a fixed width text file, is there an A5 function that can insert a special character to use as a delimiter before importing to a table?

Al Buchholz
02-05-2015, 11:11 AM
Hi Al,
Quick question:
In a fixed width text file, is there an A5 function that can insert a special character to use as a delimiter before importing to a table?

Quick response - why?

You can map the import for a file with fixed width fields and save it for future use.

Stan Mathews
02-05-2015, 11:24 AM
Here's what I do.


in_fil = file.open("P:\executive\hdb\mdi tbl\statement.txt",FILE_RW_EXCLUSIVE)
out_fil = file.create("P:\executive\hdb\mdi tbl\stmt_impt.txt",FILE_RW_EXCLUSIVE)
msk = space(6)+"~"+space(40)+"~"+space(30)+"~"+space(20)+"~"+space(2)+"~"+space(5)+"~"+space(8)+"~"+space(5)+"~"+space(9)+"~"
msk = msk+space(1)+"~"+space(14)+"~"+space(8)+"~"+space(8)+"~"+space(5)
while .not. in_fil.eof()
in_ln = in_fil.read_line()
out_ln = mask(in_ln,msk)
out_fil.write_line(out_ln)
end while
out_fil.flush()
out_fil.close()
in_fil.close()


The input file has no delimiters, just one long line of characters per record. The fields map to the space(#) positions in the msk variable. The file. methods are very fast.

MoGrace
02-05-2015, 11:29 AM
Quick response - why?

You can map the import for a file with fixed width fields and save it for future use.I know I am supposed to be able to do this, but when I tried it the import genie skewed the data and I could not get it mapped correctly (v11). Apparently a space between fields is not enough for the genie unless character fields are also in quotes.

MoGrace
02-05-2015, 11:34 AM
Hi Stan,
A mask? How clever is that! I take it the space() is the field size in the text file... so many functions I have never found!

Thank you!

I have an old DOS inventory program we use and cannot get any data out of it except thru its reports, this will be a big help.

Have you got one for removing the (fixed) headers and printer control codes too?

Stan Mathews
02-05-2015, 11:41 AM
Not sure what control codes would be found. Be glad to look at a sample file.

As for headers, you know how many header lines there are each time? Read them and don't write them....


in_fil = file.open("P:\executive\hdb\mdi tbl\statement.txt",FILE_RW_EXCLUSIVE)
out_fil = file.create("P:\executive\hdb\mdi tbl\stmt_impt.txt",FILE_RW_EXCLUSIVE)
msk = space(6)+"~"+space(40)+"~"+space(30)+"~"+space(20)+"~"+space(2)+"~"+space(5)+"~"+space(8)+"~"+space(5)+"~"+space(9)+"~"
msk = msk+space(1)+"~"+space(14)+"~"+space(8)+"~"+space(8)+"~"+space(5)
while .not. in_fil.eof()
'three header lines to be discarded
for qx = 1 to 3
in_fil.read_line()
next qx
'now ready for line 4 of the original file, good data starts
in_ln = in_fil.read_line()
out_ln = mask(in_ln,msk)
out_fil.write_line(out_ln)
end while
out_fil.flush()
out_fil.close()
in_fil.close()

Which naturally enough leads us to adding our own header line of field names.


in_fil = file.open("P:\executive\hdb\mdi tbl\statement.txt",FILE_RW_EXCLUSIVE)
out_fil = file.create("P:\executive\hdb\mdi tbl\stmt_impt.txt",FILE_RW_EXCLUSIVE)
msk = space(6)+"~"+space(40)+"~"+space(30)+"~"+space(20)+"~"+space(2)+"~"+space(5)+"~"+space(8)+"~"+space(5)+"~"+space(9)+"~"
msk = msk+space(1)+"~"+space(14)+"~"+space(8)+"~"+space(8)+"~"+space(5)
'write a field name header line
hd_ln = "Id~name~addr~city~st~zip~occupation"
out_fil.write_ln(hd_ln)
while .not. in_fil.eof()
'three header lines to be discarded
for qx = 1 to 3
in_fil.read_line()
next qx
'now ready for line 4 of the original file, good data starts
in_ln = in_fil.read_line()
out_ln = mask(in_ln,msk)
out_fil.write_line(out_ln)
end while
out_fil.flush()
out_fil.close()
in_fil.close()

John Cunradi
02-05-2015, 11:51 AM
There's a lot of great ideas here. here' the code that worked as an expression of an update: remspecial(Scanning_Upc_)

Al Buchholz
02-05-2015, 11:55 AM
I know I am supposed to be able to do this, but when I tried it the import genie skewed the data and I could not get it mapped correctly (v11). Apparently a space between fields is not enough for the genie unless character fields are also in quotes.

Did you try the table format instead of character delimited? (and did you skip the genie and use the create button instead? with the Table ASCII option)

Stan always has a cool way of fixing stuff......

MoGrace
02-05-2015, 12:36 PM
@ Stan: Here is an example report from 2011 no less (shows how long I have been trying to do this... ;) Its already a bit cleaned up but the control code appears on line 61 & 102 - it is chr(12) I think for FF. Notice the header on page 1 is 12 lines, but on page 2 it is 11. The footer is consistent at 5 lines at the end followed by another FF.

39691

@ AL: No I didn't try it without the genie, but I will...

@ John: Hope you don't mind me cutting in on your thread...seemed related to me!

Al Buchholz
02-05-2015, 12:40 PM
@ Stan: Here is an example report from 2011 no less (shows how long I have been trying to do this... ;) Its already a bit cleaned up but the control code appears on line 102 - it is chr(12) I think for FF. Notice the header on page 1 is 12 lines, but on page 2 it is 11. The footer is consistent at 5 lines at the end followed by another FF.

39691

@ AL: No I didn't try it without the genie, but I will...

It's more than a 'normal' import.

Gonna need some line by line processing either prior to the import or record selecting after an import. Seeing some blank values in the first field when it's not a data line.
Or you want the first field to have a length of 1 and select if the value is I....

MoGrace
02-05-2015, 01:10 PM
I am working out the pseudo code to figure out the steps I need to take...brb

Using Another Notepad, I can see there are 103 total lines in the file - the last line is blank as is the first line.

The header is a consistent 11 lines on each page
Each page has 60 lines or less followed by a FF line
The Footer is 4 lines and only appears at the end, followed by a FF and blank line.

Removing the first and last blank lines would be first

Finding the control codes - chr(12) and removing them would be next.

Now I can get a line_count for the remaining file and remove the last 4 lines of the footer
and update the linecount. Also my page line count is now reduced by 2 - for the initial blank and the FF at the end makes my first page 59 lines instead of 61 (counting the FF as a line)

So by using FOR...NEXT I should be able to skip the headers like Stan showed and write the masked data to a file ready for importing.

Hopefully this will also work on a longer report....otherwise I might be able to take a page at a time in my script and append each page to the output file?

The bigger problem now is to get my users to input the data in the DOS program correctly.
That 'date' field in the report is actually tagged onto the Yard Desc field and the 'Lot" field is a combination of Lot No and Vendor ID. But its a start!

Stan Mathews
02-05-2015, 01:31 PM
I think Al's on track with the "I". Adjust your paths to suit.


in_fil = file.open("P:\example.txt",FILE_RW_EXCLUSIVE)
out_fil = file.create("P:\example_fixed.txt",FILE_RW_EXCLUSIVE)
msk = space(26)+"~"+space(22)+"~"+space(9)+"~"+space(10)+"~"+space(13)
hd_ln = "customer~yarn_desc~lot~dt~lot~lbs"
out_fil.write_line(hd_ln)
while .not. in_fil.eof()
in_ln = strtran(in_fil.read_line(),chr(12),"")
if left(in_ln,1) = "I"
out_ln = substr(mask(in_ln,msk),5)
out_fil.write_line(out_ln)
end if
end while
out_fil.flush()
out_fil.close()
in_fil.close()

MoGrace
02-05-2015, 01:54 PM
Stan - how are you looping thru the text file?

Stan Mathews
02-05-2015, 02:32 PM
The .READ_LINE() method retrieves text data from the current file offset to the end of the line (marked by a carriage return and line feed ).

The offset is then the beginning of the next line so the next .READ_LINE() reads the next line. Not so much looping as just reading every line until the eof() is found. The initial opening of the file positions the offset at the zeroth position.

Note: We don't fetch_eof() just eof().

MoGrace
02-05-2015, 03:18 PM
I'm sure I will have more questions when I have had a chance to try it out - this has been V E R Y helpful - thanks again.