Today, I am going to discuss my attempts to parse c style comments.
For example,
//This is a comment
/***This is also
*** a comment ***/
Initially, I came up with a regex for /*…*/ style comments :
/\*.*\*/
Well, the above expression was not able to parse comments like:
/*** This is a comment ***/
I googled and came across http://ostermiller.org/findcomment.html where I found the regex:
/\*(.|[\r\n])*?\*/
This was able to match comments like the above one. But it’d also match the following /*…*/ comments which are not really comments:
s = "This is a string: /* with a comment */";
//comment1 /*
foo();
//comment2 */
I then worked on a regex for //… style comments: //[^\n]*\n
Then I combined the two regexes by or and my regex pattern becomes:
//[^\n]*\n|/\*(.|[\r\n])*?\*/
Now, this pattern is able to search for both: //… and /*…*/ style comments and avoid matches for patterns like:
//comment1 /*
foo();
//comment2 */
One caveat that remains is the /*…*/ pattern in
s = "This is a string: /* with a comment */";
getting matched. If any one has a work around this issue, please comment.
I hope this helps.