ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
It has been stated on page #25, in the book on Compiler design in C, by Allen Holub; that need to seperate symbols (NT, T) on the rhs with blanks.
Else, if the blank spaces were to be included, then need to have the requisite number of blank spaces as say, " " for single blank space.
Want to know how the code for the same is to be implemented in C code, or is handled in lex.
In the code provided by @NevemTeve here, in the integrated lexer, could not find the appropriate code for seperating based on space.
It has been stated on page #25, in the book on Compiler design in C, by Allen Holub; that need to seperate symbols (NT, T) on the rhs with blanks.
Else, if the blank spaces were to be included, then need to have the requisite number of blank spaces as say, " " for single blank space.
Your reference to page #25 in Holub book and posted image do not match my print version either by page or section. Please check that reference if needed.
Quote:
Originally Posted by ajiten
Want to know how the code for the same is to be implemented in C code, or is handled in lex.
In Lex/Flex you would handle embedded space just as stated in your question - include requisite number of spaces enclosed in quotes where needed in relevant patterns.
Quote:
Originally Posted by ajiten
In the code provided by @NevemTeve here, in the integrated lexer, could not find the appropriate code for seperating based on space.
After quick (non-critical) look at the linked code it appears that there are no cases defined where space is significant and spaces are simply discarded in all cases (in LexGet(...)). So there is no handling for tokens including or separated by a specific number of spaces.
I defer to NevemTeve if I have not understood that code correctly.
I admit I don't understand what the question is. In this minimalistic program spaces have no importance.
In C, for example, 'a+ + +b' and 'a+ ++b' are both valid, and mean different things: a+(+(+b)) vs a+(++b). Also 'int x' cannot be written without whitespace. These are handled by the lexical parser.
It has been stated on page #25, in the book on Compiler design in C, by Allen Holub; that need to seperate symbols (NT, T) on the rhs with blanks.
Else, if the blank spaces were to be included, then need to have the requisite number of blank spaces as say, " " for single blank space.
Want to know how the code for the same is to be implemented in C code, or is handled in lex.
This is telling how the human reader should read the grammar in the book, not how the parser/C program is reading its input.
This is telling how the human reader should read the grammar in the book, not how the parser/C program is reading its input.
That may be part of the reason for the obscurity of the question, but another look at the difficult to read scanned page says this...
Code:
expr -> t e r m "|" t e r m
The spaces on the right side of the preceeding production are there to separate
successive terminal and nonterminal symbols... If, however, we wanted to specify
a space as a terminal symbol, we can simply enclose the space in quotes. For ex-
ample the right side of the production
expr -> term " " term
consists of two occurrances of the nonterminal term surrounding the terminal
symbol " " (the space character)
So it is showing explicitly how you might include a space as a terminal symbol in the grammar, by quoting it.
Given that as context, the question then only makes sense as how to pass the space character from the lexer to the parser, and the answer is the same, include a quoted space as a pattern in the lexer which returns the character as its own token.
The text says nothing about the consequences of treating a space character as a token instead of separator in the lexer (probably expecting the reader to think about it themselves).
Note to ajiten: I have asked before that you post code and example texts as plain text when asking questions, for several reasons, including that doing so makes it much easier for others to follow and understand the discussion. The confusion in this thread amply demonstrates one problem that can result from posting images rather than text.
You replied at the time that you post the scans or screenshots for your own convenience. But this forum exists not only for your convenience, but for the convenience and usefulness of all, both now and into the future.
So I am asking again that you please end this behavior and make the effort to post minimal, relevant code and example text inline rather than as screenshots or external links in order to provide that convenience to others, posting images or links only when the content cannot be communicated as text.
Last edited by astrogeek; 11-21-2023 at 06:39 PM.
Reason: tpoys, topsy, typos
For what it may be worth, in one compiler that I was involved with we actually replaced "the lexer" with a hand-coded equivalent, and it worked quite well. The language wasn't complicated, and so, neither was the code. All we really needed was getch() and a one-character ungetch().
"Lexers" are purely confined to "character-twiddling," and so the usual regular-expression-centric solutions are not the only way to get the job done. "Just get the job done, and move on." "The Parser" is where the real magic is done.
But also: if "spaces" are "a symbol" in your language (as in Python), perhaps the simplest way to handle this is "in the [hand-coded ...] lexer." Simply arrange for it to return "a token" that consists of "spaces." (If the number of spaces is important, as in Python, you will have a little more work to do on the top side ...)
In the end: "don't spend too much time staring at 'compiler books.'" (None of which I have ever found to be readable.) Just find a way to get the project done. "Mathematical abstractions" are good ... only to a point.
Last edited by sundialsvcs; 11-28-2023 at 07:59 AM.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.