 | | | | |  | | | | | Guest | Text file parsing -
06-04-2007, 09:47 AM
Hi,
I'm trying to parse the text file, which is of size more than 2mb. I'm
using the following sample code
Open "c:\sim1.txt" For Input As #1
Do While Not EOF(1)
Input #1, Data
If (InStr(Data, "Summary")) Then
str = str & Data
End If
Loop
Close #1
str is a string.
the text file consists of more than 20000 lines. i need to read the
values from each of these 20000 lines and apply some business rules.
based upon the conditions that meet the business rules, i need to
cl***ify the sim1.txt file into 4 different files. the problem i'm
facing is, it is taking lot of time to run this program. the above
code that is shown is without business rules, as this program itself
is taking lot of time. the same program that i have done in java is
taking very less time. can anybody suggest or give me ideas or
alternative solution to solve this problem.
Thanking you,
Regards,
Ratnakar Pedagani. | | | | | | | | Guest | Re: Text file parsing -
06-04-2007, 09:47 AM
"Ratnakar Pedagani" <EMAIL REMOVED> wrote in message
news:EMAIL REMOVED om...
| Hi,
|
| I'm trying to parse the text file, which is of size more than 2mb. I'm
| using the following sample code
|
| Open "c:\sim1.txt" For Input As #1
| Do While Not EOF(1)
| Input #1, Data
|
| If (InStr(Data, "Summary")) Then
| str = str & Data
| End If
| Loop
| Close #1
|
The line
str = str & Data
is building a very large string, which has to be copied into new memory
each time through the loop.
There are different ways to improve this, depending on the situation.
If possible, open the four output files before starting the loop. Then
read each line, decide if it goes into one of the output files, and
write it there if so, before continuing the loop. This would avoid the
large string altogether.
If you need to gather all the information before making any decisions,
then you should try a different way of storing all the strings. Setting
up an array of strings, and using ReDim to increase its size as needed,
would be the simplest. You still have to allocate a lot of string space,
but at least you don't have to keep copying strings around. This is the
technique used by some string builder cl***es in other languages, and
probably in Java. | | | | | | | | Guest | Re: Text file parsing -
06-04-2007, 09:47 AM
It is a lot more efficient to create the four long empty strings in advance
to hold the four categories of information and to use the mid$ function to
insert the matching text into the relevant string of the four.
something like this:
Dim pString1 As String ' buffer string
Dim pMax1 As Long ' holds the length of the buffer string
Dim pCurr1 As Long ' holds the next free position in pString
Dim pLen1 As Long ' holds the length of the input sting
Dim pTxt1 As String ' input string
' setup empty string for one output string -
' each output category must have its own
pString1 = Space$(5000)
pMax1 = 5000
pCurr1 = 1
' set up four loops one for each category of information
Do While....
pTxt1 = "newstring1"
pLen1 = Len(pTxt1)
' see if the empty string needs to be extended
If pCurr1 + pLen1 > pMax1 Then
pString1 = pString1 & Space$(10 * pLen1)
pMax1 = Len(pString1)
End If
Mid$(pString1, pCurr1) = pTxt1
pCurr1 = pCurr1 + pLen1
Loop
' when done use RTrim$ to remove excess spaces from pString
cheers, soeren
"Steve Gerrard" <EMAIL REMOVED> wrote in message
news:7bOdnfSimqyVILvcRVn-EMAIL REMOVED...
>
> "Ratnakar Pedagani" <EMAIL REMOVED> wrote in message
> news:EMAIL REMOVED om...
> | Hi,
> |
> | I'm trying to parse the text file, which is of size more than 2mb. I'm
> | using the following sample code
> |
> | Open "c:\sim1.txt" For Input As #1
> | Do While Not EOF(1)
> | Input #1, Data
> |
> | If (InStr(Data, "Summary")) Then
> | str = str & Data
> | End If
> | Loop
> | Close #1
> |
>
> The line
> str = str & Data
> is building a very large string, which has to be copied into new memory
> each time through the loop.
>
> There are different ways to improve this, depending on the situation.
>
> If possible, open the four output files before starting the loop. Then
> read each line, decide if it goes into one of the output files, and
> write it there if so, before continuing the loop. This would avoid the
> large string altogether.
>
> If you need to gather all the information before making any decisions,
> then you should try a different way of storing all the strings. Setting
> up an array of strings, and using ReDim to increase its size as needed,
> would be the simplest. You still have to allocate a lot of string space,
> but at least you don't have to keep copying strings around. This is the
> technique used by some string builder cl***es in other languages, and
> probably in Java.
>
>
>
> | | | | | | | | Guest | Re: Text file parsing -
06-04-2007, 09:47 AM
Hi,
I'm very much impressed with the solution that you gave it to me. The
program that i have written is taking 1 min 10 sec time. the program
that u suggested is taking 7 secs of time. is there any alternative
solution which takes lesser time than u suggested earlier.
Thanking you,
Regards,
Ratnakar Pedagani
"S.W. Rasmussen" <EMAIL REMOVED> wrote in message news:<4126e041$0$237$EMAIL REMOVED >...
> It is a lot more efficient to create the four long empty strings in advance
> to hold the four categories of information and to use the mid$ function to
> insert the matching text into the relevant string of the four.
>
> something like this:
>
> Dim pString1 As String ' buffer string
> Dim pMax1 As Long ' holds the length of the buffer string
> Dim pCurr1 As Long ' holds the next free position in pString
> Dim pLen1 As Long ' holds the length of the input sting
> Dim pTxt1 As String ' input string
>
> ' setup empty string for one output string -
> ' each output category must have its own
>
> pString1 = Space$(5000)
> pMax1 = 5000
> pCurr1 = 1
>
> ' set up four loops one for each category of information
> Do While....
> pTxt1 = "newstring1"
> pLen1 = Len(pTxt1)
>
> ' see if the empty string needs to be extended
> If pCurr1 + pLen1 > pMax1 Then
> pString1 = pString1 & Space$(10 * pLen1)
> pMax1 = Len(pString1)
> End If
>
> Mid$(pString1, pCurr1) = pTxt1
> pCurr1 = pCurr1 + pLen1
> Loop
>
> ' when done use RTrim$ to remove excess spaces from pString
>
> cheers, soeren
>
>
> "Steve Gerrard" <EMAIL REMOVED> wrote in message
> news:7bOdnfSimqyVILvcRVn-EMAIL REMOVED...
> >
> > "Ratnakar Pedagani" <EMAIL REMOVED> wrote in message
> > news:EMAIL REMOVED om...
> > | Hi,
> > |
> > | I'm trying to parse the text file, which is of size more than 2mb. I'm
> > | using the following sample code
> > |
> > | Open "c:\sim1.txt" For Input As #1
> > | Do While Not EOF(1)
> > | Input #1, Data
> > |
> > | If (InStr(Data, "Summary")) Then
> > | str = str & Data
> > | End If
> > | Loop
> > | Close #1
> > |
> >
> > The line
> > str = str & Data
> > is building a very large string, which has to be copied into new memory
> > each time through the loop.
> >
> > There are different ways to improve this, depending on the situation.
> >
> > If possible, open the four output files before starting the loop. Then
> > read each line, decide if it goes into one of the output files, and
> > write it there if so, before continuing the loop. This would avoid the
> > large string altogether.
> >
> > If you need to gather all the information before making any decisions,
> > then you should try a different way of storing all the strings. Setting
> > up an array of strings, and using ReDim to increase its size as needed,
> > would be the simplest. You still have to allocate a lot of string space,
> > but at least you don't have to keep copying strings around. This is the
> > technique used by some string builder cl***es in other languages, and
> > probably in Java.
> >
> >
> >
> > | | | | | | | | Guest | Re: Text file parsing -
06-04-2007, 09:47 AM
On 23 Aug 2004 08:43:44 -0700, EMAIL REMOVED (Ratnakar
Pedagani) wrote:
>Hi,
>
>I'm very much impressed with the solution that you gave it to me. The
>program that i have written is taking 1 min 10 sec time. the program
>that u suggested is taking 7 secs of time. is there any alternative
>solution which takes lesser time than u suggested earlier.
One simple method is looking at Length on the Open line
A few more come to mind, but to some extent they have been covered.
ie: buffer file read and writes (up to about 100k)
and use Mid$() as much as possible | | | | | | | | Guest | Re: Text file parsing -
06-04-2007, 09:47 AM
"Ratnakar Pedagani" <EMAIL REMOVED> wrote in message
news:EMAIL REMOVED om...
| Hi,
|
| I'm very much impressed with the solution that you gave it to me. The
| program that i have written is taking 1 min 10 sec time. the program
| that u suggested is taking 7 secs of time. is there any alternative
| solution which takes lesser time than u suggested earlier.
|
| Thanking you,
| Regards,
| Ratnakar Pedagani
|
An add on to Jerry's post:
I would consider trying
Dim strInput As String
Dim strLines() As String
Dim n As Long
nFile = FreeFile 'better than just using 1
Open "c:\sim1.txt" For Input As nFile
nLen = LOF(nFile)
strInput = Space$(nLen)
Get #nFile,,strInput
Close nFile
strLines = Split(strInput, vbNewLine)
For n = LBound(strLines) to Ubound(strLines)
'process each strLines(n) as before
Next n
This reads the whole file in at once, then breaks it into an array of
strings, one for each line. If the file is really big, this would use up
a lot of memory, but often it runs faster than reading in each line. | | | | | | | | Guest | Re: Text file parsing -
06-04-2007, 09:47 AM
Ratnakar,
If you benchmark any of the improvements to the mid$() insertion method I
yould be interested in the result. I use text parsing in several routines
and any improvement in speed is obviously welcome.
Soeren
"Steve Gerrard" <EMAIL REMOVED> wrote in message
news:LbOdnYeza7t9ArfcRVn-EMAIL REMOVED...
>
> "Ratnakar Pedagani" <EMAIL REMOVED> wrote in message
> news:EMAIL REMOVED om...
> | Hi,
> |
> | I'm very much impressed with the solution that you gave it to me. The
> | program that i have written is taking 1 min 10 sec time. the program
> | that u suggested is taking 7 secs of time. is there any alternative
> | solution which takes lesser time than u suggested earlier.
> |
> | Thanking you,
> | Regards,
> | Ratnakar Pedagani
> |
>
> An add on to Jerry's post:
>
> I would consider trying
>
> Dim strInput As String
> Dim strLines() As String
> Dim n As Long
>
> nFile = FreeFile 'better than just using 1
> Open "c:\sim1.txt" For Input As nFile
> nLen = LOF(nFile)
> strInput = Space$(nLen)
> Get #nFile,,strInput
> Close nFile
>
> strLines = Split(strInput, vbNewLine)
>
> For n = LBound(strLines) to Ubound(strLines)
> 'process each strLines(n) as before
> Next n
>
> This reads the whole file in at once, then breaks it into an array of
> strings, one for each line. If the file is really big, this would use up
> a lot of memory, but often it runs faster than reading in each line.
>
>
>
> | | | | | Thread Tools | | | | Display Modes | Linear Mode |
Posting Rules
| You may not post new threads You may not post replies You may not post attachments You may not edit your posts HTML code is Off | | | |  |