Regex pattern for c style comments

Today, I am going to discuss my attempts to parse c style comments.

For example,

//This is a comment

/***This is also
*** a comment ***/

Initially, I came up with a regex for /*…*/ style comments :
/\*.*\*/
Well, the above expression was not able to parse comments like:

/*** This is a comment ***/

I googled and came across http://ostermiller.org/findcomment.html where I found the regex:
/\*(.|[\r\n])*?\*/
This was able to match comments like the above one. But it’d also match the following /*…*/ comments which are not really comments:

s = "This is a string: /* with a comment */";
//comment1 /*
foo();
//comment2 */

I then worked on a regex for //… style comments: //[^\n]*\n

Then I combined the two regexes by or and my regex pattern becomes:

//[^\n]*\n|/\*(.|[\r\n])*?\*/

Now, this pattern is able to search for both: //… and /*…*/ style comments and avoid matches for patterns like:

//comment1 /*
foo();
//comment2 */

One caveat that remains is the /*…*/ pattern in
s = "This is a string: /* with a comment */";
getting matched. If any one has a work around this issue, please comment.

I hope this helps.

Advertisements

2 thoughts on “Regex pattern for c style comments

  1. This should work:
    (?s)\/\*(?:(\*(?!/))|(?:[^\*]))*\*/|//[^\n\r]*(?=[\n\r])

    (?s) tells the . to match newlines characters too
    the rest of the regex means:

    /* followed by ( or ) repeated any number of times, followed by */

    OR

    // followed by

    (?!) is a negative lookahead
    (?=) is a positive lookahead

    The positive lookahead is useful to avoid deleting newlines after comments. Please tell me if I missed something, some exercise is always good 🙂

    • It is still able to parse a comment pattern in a C string 😦 , e.g.:
      s = “This is a string: /* with a comment */”;

      A work around could be to combine the comment regex with other regexes for elements in the language, so that the comment pattern inside a string would not be matched by the comment regex.

      Let’s take a simple example:
      ws = regex for whitespace
      cs = above regex for comment
      var = regex for valid variable
      str = regex for a string

      So, we can combine these into a new pattern:
      (ws|cs)*var(ws|cs)*=(ws|cs)*str(ws|cs)*;(ws|cs)*

      This is a very crude example. But, it works. Now the pattern knows, where to search for comments and where not to. 😉

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s