Parser in Perl

Sergei Steshenko · 01-26-2010, 07:57 PM

Quote:

Originally Posted by MTK358

Next, I don't understand extract_bracketed()'s third parameter.

???????????????

From the documentation:

Quote:

and a prefix pattern. As before, a missing prefix defaults to optional whitespace

So what exactly is not clear ?

MTK358 · 01-27-2010, 07:36 AM

So to extract the pattern "keyword { ... }", you would use this, right?

Code:

extract_bracketed($text, '{}', 'keyword\s*');

MTK358 · 01-27-2010, 08:50 AM

OK, I figured that out.

Now, how do I find every instance of 'keyword { ... }' in a string and store it separately, including the prefix?

Sergei Steshenko · 01-27-2010, 12:06 PM

Quote:

Originally Posted by MTK358

So to extract the pattern "keyword { ... }", you would use this, right?

Code:

extract_bracketed($text, '{}', 'keyword\s*');

From reading the documentation this is what I understand. And it even works for me.

Sergei Steshenko · 01-27-2010, 12:11 PM

Quote:

Originally Posted by MTK358

OK, I figured that out.

Now, how do I find every instance of 'keyword { ... }' in a string and store it separately, including the prefix?

The documentation tells you about $extracted, $remainder, $prefix, doesn't it ? And if you agree with me that it does, doesn't the $remainder item ring the bell ? I.e. don't you think $remainder can be fed as $text as many times as one wants ?

sundialsvcs · 01-27-2010, 01:08 PM

You might find it more convenient to use one of the actual parsers that are available in the CPAN library.

You define a grammar for whatever language that you want to process, and the parser does all the heavy-lifting for you.

http://search.cpan.org ... it's your best friend.

MTK358 · 01-27-2010, 01:13 PM

I still think the easiest way would be to write my own simple, specialized parser for it, that would even understand C style comments and give more meaningful, compiler-like error messages.

But how do you iterate through the chars in a string in Perl?

Sergei Steshenko · 01-27-2010, 01:38 PM

Quote:

Originally Posted by MTK358

I still think the easiest way would be to write my own simple, specialized parser for it, that would even understand C style comments and give more meaningful, compiler-like error messages.

But how do you iterate through the chars in a string in Perl?

perldoc -f substr
perldoc -f length
.

Plus remember that $prefix is returned too.

And no, don't reinvent the wheel - so far Text::Balanced has all you need, and you'll have to add minimum glue code.

There is Perl code understanding "C" comments around, and it's a FAQ.

Also, read GNU CPP documentation

(
http://gcc.gnu.org/onlinedocs/cpp/
http://tigcc.ticalc.org/doc/cpp.html
)
- for me GNU CPP is the default tool for getting rid of "C" comments.

MTK358 · 01-27-2010, 01:44 PM

Quote:

Originally Posted by Sergei Steshenko

perldoc -f substr
perldoc -f length
.

Plus remember that $prefix is returned too.

And no, don't reinvent the wheel - so far Text::Balanced has all you need, and you'll have to add minimum glue code.

There is Perl code understanding "C" comments around, and it's a FAQ.

Also, read GNU CPP documentation

(
http://gcc.gnu.org/onlinedocs/cpp/
http://tigcc.ticalc.org/doc/cpp.html
)
- for me GNU CPP is the default tool for getting rid of "C" comments.

Yeah, I am trying to make my own paraser and it just seems to get quite ugly fast, so maybe I should return to Text::Balanced.

And I didn't know that you can use CPP to strip out C-style comments. I'll read on it and see if I can get it to work.

MTK358 · 01-27-2010, 01:56 PM

Basically, I wonder how to make CPP only remove comments and merge lines ending with backslashes, but not process macros, includes, etc.?

Sergei Steshenko · 01-27-2010, 02:04 PM

Quote:

Originally Posted by MTK358

Basically, I wonder how to make CPP only remove comments and merge lines ending with backslashes, but not process macros, includes, etc.?

The answer is here: http://gcc.gnu.org/onlinedocs/cpp/In...tml#Invocation .

MTK358 · 01-27-2010, 03:19 PM

I can't seem to figure it out from there.

Sergei Steshenko · 01-27-2010, 03:50 PM

Quote:

Originally Posted by MTK358

I can't seem to figure it out from there.

Really ? Did you read ? Did you look for all occurrences of the word "comment" ?

...

Let me tell you something. When I studied English, I pretty quickly discovered that the appropriate meaning of an unknown to me word was not among the first meanings given by the dictionary.

The same applies to SW documentation - often the needed info is not in the beginning. According to my understanding, it is possible to suppress macro expansion while still processing comments.

MTK358 · 01-27-2010, 03:56 PM

Using the browser's find function, I discovered "-fpreprocessed" may do the job. But the problem is that it doesn't splice escaped newlines.

EDIT: that might not be an issue, I can probably splice escaped newlines in Perl using s/\\\n//g.

EDIT2: I've tested the splicing trick, and it seems to work just as described in the CPP manual.

Now, how do you find all instances of "keyword { ... }" and process them with Text::Balanced?.

MTK358 · 01-27-2010, 08:09 PM

Anyway, here is my current code, it first slurps the file, splices escaped newlines, writes it to a new file with the extension changes to ".c", runs cpp to extract the comments and write to a temp file, and then gets rid of the temp file.

The remarkable thing is that it worked perfectly the first try!!!

Code:

#!/usr/bin/env perl

foreach $filename (@ARGV) {
	open(INFILE, "<$filename");
	undef $/;
	$file = <INFILE>;
	close(INFILE);
	
	$file =~ s/\\\n//g;
	
	$outfilename = $filename;
	$outfilename =~ s/(.*)\..*/\1.c/;
	open(OUTFILE, ">$outfilename");
	print OUTFILE $file;
	close(OUTFILE);
	
	system("cpp -fpreprocessed $outfilename -o $outfilename.temp");
	system("rm $outfilename");
	system("mv $outfilename.temp $outfilename");
}