Jan 14 2009

The evil HTML

We all know that HTML emails are more apealing and produce higher clickthrough. So what is the big deal with HTML? Why do the spamfilters hate them so much?

This post explains some of the basic HTML email techniques used by spammers to hide the real content of the email.

The filters can not see the emails in the same way we see them. They only see the sourcecode and they can only guess what it looks like within the email client.

For example:

cell 1 cell 2

to the filter will look like:

<table border=”1″>
<tbody>
<tr>
<td>cell 1</td>
<td>cell 2</td>
</tr>
</tbody>

</table>

Knowing this, a keyword based filters can be easily tricked as they, are looking for particular keywords within the email. For example if a word “viagra” apears in the email it is most likely to be blocked. However breaking this word into 5 cells of a table will make it impossible for the filter to recognize the word.

Example:

v i a g r a

will be seen as:

<table border=”0″>
<tbody>
<tr>
<td>v</td>
<td>i</td>
<td>a</td>
<td>g</td>
<td>r</td>
<td>a</td>
</tr>
</tbody></table>

So unless “<td>v</td><td>i</td><td>a</td><td>g</td><td>r</td><td>a</td>” isĀ  in the spam keyword database it will be percieved as a simple table with random letters.

This is the very basic way to trick the spamfilter and it is very unlikely that any of the up-to-date filters will let it through. However this is a great example of what the spamfilters have to put up with when dealing with HTML.