In the age of advanced Internet Technologies there are thousands of different web-sites of any content, and every day the other thousand of web-sites are appeared. Thus, arises a question: how can user find necessary information among this set of interest?! To help Internet’s users to solve this problem there are Search Engines which serve a very important role: they let users to find required information and assist web-sites’ owners to get revenue by monetizing traffic of their web-sites.
Nowadays there are great numbers of web-sites in World Wide Web which enable users to make a search thought Internet.
For several years, Orbitscripts Company has been developed search sites which are used different solutions of creating optimal search engine. In this article we decide to share our experience with you.
There are two basic methods of creating search solutions. The differences between them are the mode of collecting information, its storage and index. Let’s separately examine each solution.
Own search engine
This method assumed that you have you own search engine. Installed software of such engine collects the information from network resources (web-sites, web-pages, various files etc.) and storages this information into its database. It’s obvious that nonindexing and disordered information is useless and visitor’s search may take a long time. By having your own search engine you will be independent from anyone because everything will be concentrated in your hands – the data collections, its storage and its usage – for example, in the case if you want to create your own search site. Unfortunately, this solution as everything in our world is not without flaws, it has strict requirements to hardware and network resources. To collect the information there is a need of high-bandwidth Internet access, to storage the collected web-sites a large footprint is required and for the information indexing you will need a significant computer resources (such as processors, RAM etc.), otherwise the information won’t appear in search results. We must admit that this solution is rather expensive and not everyone can afford it. The examples of search engines which are using this method are Google, Yahoo, Ask etc.
Getting search results from other search engines
At this rate you system acts as a broker between you own visitors and other search engines. Working process of this method is rather simple: your system gets request from your visitor, then it translates the request to other search engines, receives the answer and return the answer to your visitor. The other advantage of this method is that there is no need of any significant investments of computing resources. You don’t have to pre-collect information from large amount of web-sites and then storage it on your server for further indexing. All you required is a high-bandwidth Internet access which, as matter of fact, already affords a majority of providers.
However, this solution also is not without drawbacks. As the operated data belongs to other resources your system has a reduced reliability. Data’s format which is returned from the search engine may be changed at any time that’s mean – your search become useless until you adapt your system to these changes. The request from your visitor occurs in two steps: step one – a request on your site, step two – your request on search resource and backwards, thereby the latency time for user’s response can be delay. Thus crops up another hidden drawback – your system will entirely depend on the reaction time of the search engine which resources you use. A large amount of requests to the search engine’s service can lead to unpleasant consequences – you system can be banned. Or the search resource which you are using may disappear from the Internet landscape. In both cases your system stops working.
Though, let’s not jump with conclusions. Indeed, if you build a system based only on one search recourse then there is no point to use this method. But what will be if you’ll increase the reliability of you system by using several search engines? Let’s imagine that instead of one search resource you use ten resources. Even if three of the search engines refuse your request and two of them changed the data’s format, you still have five working search engines.Thus visitor will receive search results on his request in your system. To avoid a situation when your system can be banned use caching. When a visitor makes a request to you system, you’ll reserve the request from search engine on your plate. And when there is a similar request from the other visitor (or may be from the same one), you won’t have to make second request to the remote search service, your visitor just get the previously received request from the remote server which will reduce the latency time. Therefore you will cut down the number of the requests to remote servers and increase the reliability of your system.
You may think that the additional ten resources will increase the required time to survey all ten resources which will at least tenfold augment the latency time of your system response. Partly you’re right. If search resources are inquired sequentially, then you will have to face up described above problems. Therefore we suggest to make parallel requests to search resources.
During several years our Company uses the method of parallel requests to search resources. This method has proved to be excellent. However, the solution has its own nuances. What if when using the parallel requests scheme nine sources return results for 3 seconds, and the tenth – sometimes for 2 seconds and sometimes for 9 seconds. It is obvious that if we wait for all ten sources, general latency time can reach 9 seconds, which is too long, and the cause is – tenth resource, the weakest link. Obviously, the best solution is to adjudge a certain time interval during which your system receives data from remote search resources. For our example, if the time interval is 4 seconds, requests from ten or nine resources always are getting in 4 seconds!
Below, you can see a diagram which explains differences between sequential and parallel requests schemes:
This method has indisputable advantages over own search engine method. You can provide your visitors with search results from several search engines, so your visitors don’t have to surf all search resources and obtain results from each of them separetly. By using parallel method your system will combine similar search results from different search engines into one and display meta-information where the location and the sources of the result are indicated. Such information certainly is rather interesting to the visitor than a result from single system.
Notice, how may look search results collected from several search engines:
The efficiency of this search method is higher than the separate search.
This solution calls – Metasearch. While creating your own system or choosing exciting one, notice that returned and obtained results from several search resources can be “ranked”, and further sorting can be based on the value of the “rank”. What is “rank”? “Rank” is a definite value estimated by certain functional algorithm – “ranking” which determines this value. Let’s examine one of the algorithms. Imagine that the result’s “rank” is higher the greater the amount of search resources where it was found and the higher the average of its positions in all found resources. Displayed above image introduce the processes of getting value for each obtained result and sorting returned search results according to this value. This type of search calls Ranking Metasearch.
Nowadays Internet is not only a way of sharing information, but also huge business machine by which people earn billions of dollars. The main source of revenue is advertising placed on web-sites. How to get ads for you web-site? It’s simple! Interesting that this process is similar to Metasearch. As matter of fact you need to register on servers which provide ads (Miva.com, LookSmart.com, …), then get results from defined links associated with certain keywords and display received results on you web-site. You will get revenue from ads impressions or from “clicks” on ads committed by visitors. But the point is that with every visitor’s search results and ads from several servers are displayed at once. (let’s hope he will make a “click”:-))
The general scheme of the search system of such type may look as follows:
Our Company has range of products of this type, including Ranking Metasearch. We’ve produced thousands modifications of our products which realize this type of search with the most sophisticated requirements expected of both the Metasearch and Ranking Metasearch.
To better understand the nature of this type of search, see the examples of realization Metasearch search systems:
· SmarPPC Search
· SmartPPC Power with lots of search plug-ins, and Ranking Metasearch in particular.
These products combine all described above methods for the practical implementation of all the advantages of the Metasearch. In addition the Company Orbitscripts modifies the algorithm of this technology and methods of sorting and processing data according to the requirements of our customers. We’re ready to implement any wild idea into life!