Addressing SPAM Mail with a Machine Learning Algorithm Approach
As a communications medium, email has become very useful and practically universal. However, the usefulness of email and its potential for future growth are jeopardized by the rising tide of unwanted email, both SPAM and viruses. This threatens to wipe out the advantages and benefits of email. An important flaw in current email standards (most notably SMTP) is the lack of any technical requirement that ensures the reliable identification of the sender of messages. A message’s domain of origin can easily be faked, or ‘spoofed’. This project would investigate the problem of email spam and identifies Machine learning methods to efficiently minimize the volumes.
This spam mail project would require a lot of technical expertise. However, this does mean that it would be a pretty impressive project with potentially a high mark. There are many ways to go about addressing the spam mail problem. One approach would be the check out Weka. On the left of that page, you'll see a link to software - Weka 3.4. You should also read this paper on spam. If you think you understand most of the concepts in the paper - then 'go and do likewise!'.
If you do pursue this project - you may want to buy Data Mining: Practical Machine Learning Tools and Techniques (Second Edition) By Witten & Frank.