Sunday, November 4, 2007

RegExp, Java, Email, RFC 2822 & YOU!

I am sure a lot of you have dealt with using regexps for validating email.  Its lots of fun. Regexps are extremely readable, make perfect sense the first time you look at them and are very intuitive…. oh what… what’s that? They are not? No crap!

Well, I had to improve one that was pretty limited in what it would allow.  After reading a few articles on the best way, I found one at Les Hazlewood's site that had a nice java implemention of the RFC 2822 standard.  While overall, it worked ok, it still allowed some emails that were not ok or stifled some that were ok.  So I changed his pattern a little:

private static final String sp = "!#$%&\'*+-/=?^_`{|}~";
private static final String ftext = "[a-zA-Z0-9]";
private static final String atext = "[a-zA-Z0-9" + sp + "]";
private static final String atom = atext + "+"; //one or more atext chars
private static final String fatom = ftext + "+";
private static final String dotAtom = "(\\\\.|-|_)" + atom;
private static final String localPart = fatom + "(" + dotAtom + ")*"; //one fatom followed by 0 or more dotAtoms.

//RFC 1035 tokens for domain names:
private static final String letter = "[a-zA-Z]";
private static final String letDig = "[a-zA-Z0-9]";
private static final String letDigHyp = "[a-zA-Z0-9-]";
public static final String rfcLabel = letDig + letDigHyp + "{0,61}" + letDig;
private static final String domain = rfcLabel + "((\\\\.|-)" + rfcLabel + ")*\\\\." + letter + "{2,6}";

//Combined together, these form the allowed email regexp allowed by RFC 2822:
private static final String addrSpec = "^" + localPart + "@" + domain + "$";

//now compile it:
public static final Pattern VALID_PATTERN = Pattern.compile(addrSpec);


and a method using that pattern:

public static boolean isValidEmail(String address) {
return VALID_PATTERN.matcher(address).matches();
}


This seems to catch all that I need it to catch and still follows the standard.  Thanks to the original author for the inspiration. ^.^

2 comments:

  1. I'm just curious - how is it different than mine? Mine is an exact representation of RFC 2822. If yours is different, how is it that it "still follows the standard"?

    I'd very much appreciate any feedback - and then I could combine it to benefit everyone :)

    Cheers,

    Les

    ReplyDelete
  2. It has been a while since I wrote this blog post and I think you have updated your's since then.

    Off the top of my head I can't remember what the difference was. I think it was something about a missing character pattern or something.

    Either way, you are the one who gave me what I was looking for, but I remember needing to make a slight tweak to it. A year goes by and I really can't remember.

    I have just moved my blog over to WP from typo, so I just sucked in all my old articles, so I haven't looked at your original/update in quite some time. I do know where credit is due and it is you, bud! ;)

    ... just looking at yours again and it looks like you took it even further than when I originally saw your article. Hope the link to you gives you more traffic!!! :D

    ReplyDelete