LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 11-18-2023, 05:08 AM   #1
ajiten
Member
 
Registered: Jun 2023
Posts: 377

Rep: Reputation: 4
Handling space in lexer.


It has been stated on page #25, in the book on Compiler design in C, by Allen Holub; that need to seperate symbols (NT, T) on the rhs with blanks.
Else, if the blank spaces were to be included, then need to have the requisite number of blank spaces as say, " " for single blank space.

Want to know how the code for the same is to be implemented in C code, or is handled in lex.

In the code provided by @NevemTeve here, in the integrated lexer, could not find the appropriate code for seperating based on space.
Attached Thumbnails
Click image for larger version

Name:	holub pg 25.png
Views:	29
Size:	176.8 KB
ID:	42081  
 
Old 11-20-2023, 05:31 PM   #2
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=15, FreeBSD_12{.0|.1}
Posts: 6,269
Blog Entries: 24

Rep: Reputation: 4206Reputation: 4206Reputation: 4206Reputation: 4206Reputation: 4206Reputation: 4206Reputation: 4206Reputation: 4206Reputation: 4206Reputation: 4206Reputation: 4206
Quote:
Originally Posted by ajiten View Post
It has been stated on page #25, in the book on Compiler design in C, by Allen Holub; that need to seperate symbols (NT, T) on the rhs with blanks.
Else, if the blank spaces were to be included, then need to have the requisite number of blank spaces as say, " " for single blank space.
Your reference to page #25 in Holub book and posted image do not match my print version either by page or section. Please check that reference if needed.

Quote:
Originally Posted by ajiten View Post
Want to know how the code for the same is to be implemented in C code, or is handled in lex.
In Lex/Flex you would handle embedded space just as stated in your question - include requisite number of spaces enclosed in quotes where needed in relevant patterns.

Quote:
Originally Posted by ajiten View Post
In the code provided by @NevemTeve here, in the integrated lexer, could not find the appropriate code for seperating based on space.
After quick (non-critical) look at the linked code it appears that there are no cases defined where space is significant and spaces are simply discarded in all cases (in LexGet(...)). So there is no handling for tokens including or separated by a specific number of spaces.

I defer to NevemTeve if I have not understood that code correctly.

Last edited by astrogeek; 11-20-2023 at 05:32 PM.
 
Old 11-20-2023, 11:12 PM   #3
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,881
Blog Entries: 1

Rep: Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871
I admit I don't understand what the question is. In this minimalistic program spaces have no importance.
In C, for example, 'a+ + +b' and 'a+ ++b' are both valid, and mean different things: a+(+(+b)) vs a+(++b). Also 'int x' cannot be written without whitespace. These are handled by the lexical parser.
 
Old 11-21-2023, 07:31 AM   #4
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,786

Rep: Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083
Quote:
Originally Posted by ajiten View Post
It has been stated on page #25, in the book on Compiler design in C, by Allen Holub; that need to seperate symbols (NT, T) on the rhs with blanks.
Else, if the blank spaces were to be included, then need to have the requisite number of blank spaces as say, " " for single blank space.

Want to know how the code for the same is to be implemented in C code, or is handled in lex.
This is telling how the human reader should read the grammar in the book, not how the parser/C program is reading its input.
 
1 members found this post helpful.
Old 11-21-2023, 03:37 PM   #5
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=15, FreeBSD_12{.0|.1}
Posts: 6,269
Blog Entries: 24

Rep: Reputation: 4206Reputation: 4206Reputation: 4206Reputation: 4206Reputation: 4206Reputation: 4206Reputation: 4206Reputation: 4206Reputation: 4206Reputation: 4206Reputation: 4206
Quote:
Originally Posted by ntubski View Post
This is telling how the human reader should read the grammar in the book, not how the parser/C program is reading its input.
That may be part of the reason for the obscurity of the question, but another look at the difficult to read scanned page says this...

Code:
    expr -> t e r m "|" t e r m

The spaces on the right side of the preceeding production are there to separate
successive terminal and nonterminal symbols... If, however, we wanted to specify
a space as a terminal symbol, we can simply enclose the space in quotes. For ex-
ample the right side of the production

   expr -> term " " term

consists of two occurrances of the nonterminal term surrounding the terminal
symbol " " (the space character)
So it is showing explicitly how you might include a space as a terminal symbol in the grammar, by quoting it.

Given that as context, the question then only makes sense as how to pass the space character from the lexer to the parser, and the answer is the same, include a quoted space as a pattern in the lexer which returns the character as its own token.

The text says nothing about the consequences of treating a space character as a token instead of separator in the lexer (probably expecting the reader to think about it themselves).

Note to ajiten: I have asked before that you post code and example texts as plain text when asking questions, for several reasons, including that doing so makes it much easier for others to follow and understand the discussion. The confusion in this thread amply demonstrates one problem that can result from posting images rather than text.

You replied at the time that you post the scans or screenshots for your own convenience. But this forum exists not only for your convenience, but for the convenience and usefulness of all, both now and into the future.

So I am asking again that you please end this behavior and make the effort to post minimal, relevant code and example text inline rather than as screenshots or external links in order to provide that convenience to others, posting images or links only when the content cannot be communicated as text.

Last edited by astrogeek; 11-21-2023 at 06:39 PM. Reason: tpoys, topsy, typos
 
1 members found this post helpful.
Old 11-28-2023, 07:46 AM   #6
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,691
Blog Entries: 4

Rep: Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947Reputation: 3947
For what it may be worth, in one compiler that I was involved with we actually replaced "the lexer" with a hand-coded equivalent, and it worked quite well. The language wasn't complicated, and so, neither was the code. All we really needed was getch() and a one-character ungetch().

"Lexers" are purely confined to "character-twiddling," and so the usual regular-expression-centric solutions are not the only way to get the job done. "Just get the job done, and move on." "The Parser" is where the real magic is done.

But also: if "spaces" are "a symbol" in your language (as in Python), perhaps the simplest way to handle this is "in the [hand-coded ...] lexer." Simply arrange for it to return "a token" that consists of "spaces." (If the number of spaces is important, as in Python, you will have a little more work to do on the top side ...)

In the end: "don't spend too much time staring at 'compiler books.'" (None of which I have ever found to be readable.) Just find a way to get the project done. "Mathematical abstractions" are good ... only to a point.

Last edited by sundialsvcs; 11-28-2023 at 07:59 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Does lexer tool need be coded in the same language, as the source language? ajiten Programming 12 07-09-2023 11:18 PM
Where to post lexer file related issues. ajiten Programming 5 07-04-2023 02:06 AM
Time from interrupt signal to user space handling zvivered Linux - Embedded & Single-board computer 4 04-03-2020 02:57 PM
[SOLVED] kernel q re: user-space memory handling adjutrix Linux - General 2 07-16-2010 11:51 AM
Division of Logical Memory Space in to User Space and Kernel Space shreshtha Linux - Newbie 2 01-14-2010 09:59 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 12:51 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration