Engineering

Response Analytics™ – MailChannels’ powerful AI technology

By MailChannels | 9 minute read

Spammers compromise hosting and email accounts by stealing passwords and leveraging exploits in hosting systems. Once an account is compromised, the spammer sends email through the account, leveraging whatever positive reputation the ISP or hosting company has accumulated with email receivers to ensure more reliable spam delivery. Email receivers and anti-spam organizations eventually add the sending network to a blocklist, at which point delivery of legitimate email from the network is significantly curtailed.

Definitions

Compromised Account: A web hosting account, email account, or server account that has been compromised by a malicious actor such as a spammer, often by stealing or guessing a password, or exploiting a software weakness.
ISP: An organization providing Internet connectivity to customers, whether via wired or wireless connections.
Mailbox Provider: An organization that provides hosting of email inboxes. Examples include Google Apps and Rackspace Mail.
SMTP: The Simple Mail Transfer Protocol, as defined in RFC5321.
Web Hosting Provider: An organization providing services for hosting web sites and applications.

Existing Approaches to Outbound Spam Control

Today, web hosting providers, ISPs, and mailbox providers employ primarily two techniques to control spam from compromised accounts: content analysis and sender behavior tracking. Combined with an effective policy system, these techniques reduce the impact of spam by detecting and stopping abuse from compromised accounts.

Content Analysis

Outgoing email messages are analyzed for abusive (i.e. spam-like) content using a variety of existing and familiar anti-spam techniques, including:

signature analysis;
URL and domain reputation analysis; and,
heuristics (i.e. regular expressions)

When up-to-date spam signatures, heuristic rules, and URL information are available, content analysis can provide accurate identification of outgoing spam messages, enabling the service provider to block these messages

Sender Reputation Tracking and Control

Spammers behave differently from legitimate senders. Whereas legitimate senders tend to deliver a predictable volume of email over time, to a repetitive and valid list of recipients, spammers send highly variable volumes of email to often-invalid lists of recipients. Providers can thus detect spammers by tracking behavioral statistics such as:

Email volume – how many messages and connections the sender has established;

Content analysis results – how many spam messages, virus messages, and clean messages has the sender attempted to deliver; and,
Recipient validity – how many valid and invalid email addresses the sender has attempted to deliver email to.
When a sender behaves like a spammer, the provider can rate limit or block the sender, as well as taking remedial actions such as informing the account owner that their account may have been compromised

Introduction to ResponseAnalytics™

Understanding SMTP Responses

The SMTP protocol is conversational – clients send requests to a server, and the server responds to let the client know whether the request was successful. Server responses always start with a three-digit number, and the value of that number tells us:

Whether the response indicates success, and,
Whether the error is of a temporary or permanent nature.

Successes are indicated generally with a number starting with “2”. For example, “250 Queued” is returned after a message has been received, to indicate the server has received and queued the message for delivery to a recipient. Temporary errors start with “4”, and permanent errors start with “5”. For example, if the SMTP request to specify a recipient (“RCPT TO”) results in an invalid recipient, the server might respond with “550 Invalid recipient”.

Example SMTP Session:

S: 220 smtp.example.com ESMTP Postfix 
C: HELO relay.example.org 
S: 250 Hello relay.example.org, I am glad to meet you 
C: MAIL FROM:<bob@example.org> 
S: 250 Ok C: RCPT TO:<alice@example.com> 
S: 250 Ok C: RCPT TO:<theboss@example.com> 
S: 250 Ok C: DATA S: 354 End data with . 
C: From: “Bob Example” <bob@example.org> 
C: To: “Alice Example” <alice@example.com> 
C: Cc: theboss@example.com 
C: Date: Tue, 15 January 2008 16:02:43 -0500 
C: Subject: Test message 
C:
C: Hello Alice. 
C: This is a test message with 5 header fields and 4 lines in the message body. 
C: Your friend, 
C: Bob
C: . 
S: 250 Ok: queued as 12345 
C: QUIT 
S: 221 Bye
{The server closes the connection}

What can SMTP Responses Tell Us?

Email servers transmit useful information to clients in the error responses they generate. Unfortunately, most email clients don’t understand fully the meaning of these error messages, and miss out on valuable insights that could improve their delivery success rate and/or help with the identification of problems.

At the most basic protocol level, an SMTP response indicates whether the request was successful or not, and if it failed, whether the failure was temporary or permanent. Temporary failures indicate that the client should try again later, whereas permanent failures indicate that a similar request made in future will also fail.

Since the advent of the spam problem, email servers have been modified to generate errors in response to certain SMTP commands when the server has reason to believe the client is behaving like a spammer. For instance, a server might reject a message with the error “550 Spam content detected” if its spam filtering system thinks the message contains spam. A temporary failure code might be returned if the server wishes the client to slow down the rate of delivery to the server; for example, “421 Sending too fast”.

Categorizing SMTP Responses

Major email services including Microsoft Live Mail (previous known as Hotmail), AOL, and Gmail all have a compendium of SMTP response codes they generate in order to deal with spam or suspected spamming senders. For example, Google will respond to messages that are not properly formatted with the following error:

550-5.7.1 [192.168.X.Y 11] Our system has detected that this message is 
550-5.7.1 not RFC 2822 compliant. To reduce the amount of spam sent to Gmail, 
550-5.7.1 this message has been blocked. Please review 550 5.7.1 RFC 2822 specifications 
for more information. mo9si449xxx2pbc.156 - gsmtp

An email client which understood the meaning of this response might send a helpful message to the user who generated the message. Alternately, an outbound spam control system might use this RFC non-compliance notice as a signal that the sender might be a spammer, given that most legitimate email senders send well formatted messages.

We would like to be able to categorize SMTP responses so that senders can take appropriate action when they receive SMTP error responses. Armed with a better understanding of the meaning of SMTP error responses, senders can conceivably do a better job at filtering spam and other abusive outgoing email traffic.

Choosing Categories

After studying billions of SMTP responses from real-world email servers, we have determined a minimum set of categories into which we can classify responses:

Category	Description	Example	Desired Sender Behavior
Content Rejection	The server has rejected receipt of a message because the content is unacceptable in some way	550 5.7.1 Message contains spam.	Do not send this message, or messages like it, again.
IP Rejection	The server has rejected connections from the client’s IP address. This type of rejection can be temporary or permanent.	421 4.7.0 [TS01] Messages from 1.2.3.4 temporarily deferred due to user complaints – 4.16.55.1; see http://postmaster. yahoo.com/421- ts01.html	Reduce the rate at which connections are established to this server; also, investigate whether there is a spam problem leading to the rejection.
DNS Error	The server has rejected the command because the sender has a DNS-related problem.	550 5.7.1 Illegal HELO	Typically, this error is caused by misconfiguration of DNS entries such as SPF records and/or reverse DNS (i.e. PTR) records. Fix these problems and the error will go away.
Flow Control	The server has rejected the command because it wishes to control the rate at which the client sends commands and/or makes connections.	421 Please slow down	Slow down the rate at which connections are made to the server.
Address Problems	The server has rejected the message because it is addressed to a non-existent or otherwise unreachable recipient. This response may be temporary or permanent in nature.	550 Invalid recipient	For permanent address errors, do not try to send to the address again; remove it from your address book or mailing list. For temporary errors, try again later.

Mapping Responses to Categories

SMTP responses are mapped to categories using a text classification system. Any form of classification system can be used; however, we recommend the use of a heuristic (i.e. rules-based) approach versus “fuzzier” approaches such as neural networks and other machine learning techniques because:

The number of possible “important” responses is small (and therefore reasonably catalogued by human operators); and,
Email receivers tend to be very specific in their responses, and small variations that might be missed by an automated classified are highly significant.

Dealing with Changing Responses

Email receivers change their SMTP responses regularly. Anti-spam policies can change frequently, and system administrators often make small but important modifications to SMTP responses. Large receivers (e.g. AOL) publish SMTP responses on Postmaster web sites; however, these sites are often not completely up-to-date.

Because of the changing nature of SMTP responses, any response categorization system needs to be capable of updating its rule set to adapt to the changes as they happen. Ideally, a global set of response categorization rules can be pushed out to the client email server whenever necessary, to ensure categorization is always up to date with whatever changes are implemented by email receivers.

Tying SMTP Responses to Sender Reputation

Once we have the capability to categorize SMTP responses, we can use these responses to train a sender behavior system to detect spam-like behavior based on the responses this behavior generates. For example, a spammer is arguably more likely than a legitimate sender to generate permanent message rejection responses.

Each time a sender generates an SMTP error response, we take the category of that response and increment a time-based counter (i.e. statistic) indicating that the sender has been responsible for that response. Later, in a policy system, we can inspect the response statistics for each sender before attempting to deliver a message, and make a decision as to whether the sender has been responsible for too many spam-like responses.

Conclusion

Email servers – particularly at large email providers such as AOL and Gmail – provide valuable insights in the error messages they return during SMTP sessions. By categorizing and tracking these responses, it’s possible for email senders to determine when a given user is starting to behave like a spammer. Response categorization allows us to detect spammers more quickly and accurately than if we analyze message content and sender behavior alone.