Webmasters FAQ

Malta's Online Search Engine and Business Exchange

Webmaster Questions

What is cloaking?

The term "cloaking" is used to describe a website that returns altered web pages to search engines crawling the site. In other words, the webserver is programmed to return different content to Maltalinks than it returns to regular users, usually in an attempt to distort search engine rankings. This can mislead users about what they'll find when they click on a search result. To preserve the accuracy and quality of our search results, Maltalinks may permanently ban from our index any sites or site authors that engage in cloaking to distort their search rankings.

Do I need to submit updated and/or outdated links and pages to Maltalinks?

Maltalinks updates its index frequently, so there's no need to submit updated or outdated links. We should pick up any changes to your site during our next crawl.

How do I submit multiple pages?

Please visit our Submit URL page to input your URLs. There's no need to submit each individual page; the domain's top-level page will suffice. Our Maltalinks crawler will find the rest.

Why doesn't Maltalinks index any of my pages?

If your pages haven't been indexed yet, it's probably because there aren't enough other pages on the web that link to them or your site has not been discovered by the Maltalinks Crawler. If your site does not contain content related to Malta or created by a Maltese person or related person, Maltalinks may exclude your page from its index. If you have not submitted your site to Maltalinks yet, submitting your top-level URL might be a good starting point.

How long does the Maltalinks robot take to index a URL once it's been submitted?

Depending on the timing of the submission and of our crawl, the entire process can take between six and eight weeks.

Where is my page's title?

Unlike many search engines, Maltalinks can return results for pages that are known but haven't been crawled yet. Since we haven't looked at those pages yet, their titles aren't shown; the Maltalinks results page displays the URL instead and in many cases will display a "No Summary Available" label in place of the page summary.

How should I request that Maltalinks not return cached material from my site?

Maltalinks stores many web pages in its cache to retrieve for users as a back-up in case the server where the page resides temporarily fails. Users can view the cached version by choosing the "Cached" link on the search results page. If you don't want your content to be accessible through Maltalinks' cache, use a <META> tag with a CONTENT="NOARCHIVE" attribute. To do so, place the following line in the <HEAD> section of your documents:

<META NAME="ROBOTS" CONTENT="NOARCHIVE">

This tag tells robots not to archive the page. Maltalinks will continue to index and follow links from the page, but will not present cached material to users. If you want to allow other robots to cache your content, but prevent Maltalinks' robots from doing so, use the following tag:

<META NAME="MALTALINKS" CONTENT="NOARCHIVE">

Please note that the change will take effect the next time Maltalinks crawls the page containing the NOARCHIVE directive in a <META> tag. If you want this change to take effect sooner, the site owner must contact us and request immediate removal of archived content. Note also that the NOARCHIVE directive only controls whether a cached version of the page is made available. To control whether the page is indexed, use CONTENT="NOINDEX". To control whether links are followed, use CONTENT="NOFOLLOW". For more information, see the Robots Exclusion page.

Maltalinks Technology Questions

How should I request that Maltalinks not crawl part or all of my site?

The standard for robot exclusion given at http://www.robotstxt.org/wc/norobots.html provides for a file called robots.txt that you can put on your server to exclude Maltalinks and other web crawlers. (Maltalinks has a user-agent of "mlinks".)

Maltalinks also understands some extensions to the robots.txt standard. Disallow patterns may include * to match any sequence of characters, and patterns may end in $ to indicate the end of a name. For example, to prevent Maltalinks from crawling files that end in .gif, you may use the following robots.txt entry:

	User-Agent: mlinks
	Disallow: /*.gif$

Please note that Maltalinks does not interpret a 401/403 response ("Unauthorized"/"Forbidden") to a robots-txt fetch as a request not to crawl any pages on the site. To prevent Maltalinks and other web crawlers from crawling any page on your site, you may use the following robots.txt entry:

	User-Agent: *
	Disallow: /

Please note also that each port must have its own robots.txt file. In particular, if you serve content via both http and https, you'll need a separate robots.txt file for each of these protocols. For example, if you wanted to allow all file types to be served via http but only .html pages to be served via https, the robots.txt file for the http protocol (http://yourserver.com/robots.txt) would be:

	User-Agent: *
	Allow: /

The robots.txt file for the https protocol (https://yourserver.com/robots.txt) would be:

	User-Agent: *
	Disallow: /
	Allow: /*.html$

Another standard which is more convenient for page-by-page use involves adding a <META> tag to an HTML page to tell robots not to index the page or not to follow the links it contains. This standard is described at http://www.robotstxt.org/wc/exclusion.html. You may also want to read what the HTML standard has to say about these tags. Remember that changing your server's robots.txt file or changing the <META> tags on its pages will not cause an immediate change in the results that Maltalinks returns, since your changes must propagate to Maltalinks' next index of the web before being reflected in Maltalinks search results.

Why is Maltalinks asking for a file called robots.txt that isn't on my server?

robots.txt is a standard document that can tell Maltalinks not to download some or all information from your web server. For information on how to create a robots.txt file, see The Robot Exclusion Standard.

Why is Maltalinks trying to download incorrect links from my server? Or from a server that doesn't exist?

It's a fact of life on the web that many links will be broken or outdated at any given time. Whenever someone publishes an incorrect link that points to your site (perhaps through a typo or a spelling error) or fails to update their pages to reflect changes on your server, Maltalinks will try to download an incorrect link from your site. This is also why you may get hits on a machine that isn't a web server at all.

Why is Maltalinks downloading information from our "secret" web server?

It is almost impossible to keep a web server secret by not publishing any links to it. As soon as someone follows a link from your "secret" server to another web server, it is likely that your "secret" URL is in the referer tag, and it can be stored and possibly published by the other web server in its referer log. So, if there is a link to your "secret" web server or page on the web anywhere, it is likely that Maltalinks and other "web crawlers" will find it.

Why isn't Maltalinks obeying my robots.txt file?

To save bandwidth, Maltalinks only downloads the robots.txt file whenever we have fetched many pages from the server. So, it may take a while for Maltalinks to learn of any changes that might have been made to your robots.txt file. Also, check that your syntax is correct against the standard at: http://www.robotstxt.org/wc/norobots.html. If there still seems to be a problem, please let us know and we'll correct it.

Please note that there is a small difference between the way Maltalinks handles the robots.txt file and the way the robots.txt standard says we should (keeping in mind the distinction between "should" and "must"). The standard says we should obey the first applicable rule, whereas Maltalinks obeys the longest (that is, the most specific) applicable rule. This more intuitive practice matches what people actually do, and what they expect us to do. For example, consider the following robots.txt file:

	User-Agent: *
	Allow: /
	Disallow: /cgi-bin

It's obvious that the webmaster's intent here is to allow robots to crawl everything except the /cgi-bin directory. Consequently, that's what we do.

How do I register my site with Maltalinks so it will be indexed?

Please visit the Submit URL form.

How do I remove a site from Maltalinks?

Maltalinks updates its entire index automatically on a regular basis. When we crawl the web, we find new pages, discard dead links, and update links automatically. Links that are outdated now will most likely "fade out" of our index during our next crawl. 

Help! Maltalinks is crawling my site too fast. What can I do?

Please send an email to links@maltalinks.com with the name of your site and a detailed description of the problem. Please also include a portion of the weblog that shows Maltalinks accesses, so we can track down the problem more quickly on our end.

Why are there hits from multiple machines at Maltalinks.com, all with user-agent mlinks?

Maltalinks was designed to be distributed on several machines to improve performance and to scale as the web grows.

My question is not answered here. Where can I send it?

Please visit our Contact Us page to find the appropriate place to send your question

© Copyright 2000-2002 Maltalinks Ltd. All Rights Reserved.