Sunday, November 4, 2007

RegExp, Java, Email, RFC 2822 & YOU!

I am sure a lot of you have dealt with using regexps for validating email.  Its lots of fun. Regexps are extremely readable, make perfect sense the first time you look at them and are very intuitive…. oh what… what’s that? They are not? No crap!

Well, I had to improve one that was pretty limited in what it would allow.  After reading a few articles on the best way, I found one at Les Hazlewood's site that had a nice java implemention of the RFC 2822 standard.  While overall, it worked ok, it still allowed some emails that were not ok or stifled some that were ok.  So I changed his pattern a little:

private static final String sp = "!#$%&\'*+-/=?^_`{|}~";
private static final String ftext = "[a-zA-Z0-9]";
private static final String atext = "[a-zA-Z0-9" + sp + "]";
private static final String atom = atext + "+"; //one or more atext chars
private static final String fatom = ftext + "+";
private static final String dotAtom = "(\\\\.|-|_)" + atom;
private static final String localPart = fatom + "(" + dotAtom + ")*"; //one fatom followed by 0 or more dotAtoms.

//RFC 1035 tokens for domain names:
private static final String letter = "[a-zA-Z]";
private static final String letDig = "[a-zA-Z0-9]";
private static final String letDigHyp = "[a-zA-Z0-9-]";
public static final String rfcLabel = letDig + letDigHyp + "{0,61}" + letDig;
private static final String domain = rfcLabel + "((\\\\.|-)" + rfcLabel + ")*\\\\." + letter + "{2,6}";

//Combined together, these form the allowed email regexp allowed by RFC 2822:
private static final String addrSpec = "^" + localPart + "@" + domain + "$";

//now compile it:
public static final Pattern VALID_PATTERN = Pattern.compile(addrSpec);


and a method using that pattern:

public static boolean isValidEmail(String address) {
return VALID_PATTERN.matcher(address).matches();
}


This seems to catch all that I need it to catch and still follows the standard.  Thanks to the original author for the inspiration. ^.^