|
Trust Rank Algorithm is all about to combat the web spam. Web
Spam denotes those web pages that are the result of spamming.
Any deliberate action solely in order to boost a web page's
position in search engine results, incommensurate with page's
real value is called spamming.
Web Spam Taxonomy contains:
a) Boosting Techniques
b) Hiding Techniques
Boosting Technique of spamming further divided into:
(i) Term spamming
(ii) Link spamming
Term spamming - Manipulating the text of web pages in order to
appear relevant to queries. The target are for term spamming are
Body of web page, Title, URL, HTML Meta tags, Anchor text.
Term Spamming:
- Repetition of one or a few specific terms e.g., free, cheap,
Viagra.Goal is to subvert TF.IDF ranking schemes - Dumping of a
large number of unrelated terms e.g., copy entire dictionaries
- Weaving Copy legitimate pages and insert spam terms at random
positions
- Phrase Stitching Glue together sentences and phrases from
different sources
Link spamming - Creating link structures that boost page rank or
hubs and authorities scores. There are three kinds of web pages
from a spammer's point of view
1. Inaccessible pages
2. Accessible pages e.g., web log comments pages spammer can
post links to his pages
|