Information for building a simple Seeks P2P client
From Seeks
Seeks has an API to send queries and receive results.
Using the API, one call on a Seeks node then does the work of calling more nodes, and merging their results.
To call on other nodes, Seeks uses a simple protocol that allows it to retrieve results for a range of similar queries around the initial query. This range is referred to as a query halo.
It is possible for a developer to write his/her own application to communicate with each Seeks node directly, not using the high level API.
Doing this allows a developer to retrieve results (not the meta-search engine results) from each Seeks node of interest, and to merge / rank them using his/her own sauce.
The procedure is as follows:
- First create the query halo by computing n-grams over the initial query. Seeks works with n=5, though this is a configurable value. Most importantly, Seeks uses a special type of n-grams:
The sentence
Houston we have a problem
becomes
Houston ... Houston <skip> have a problem Houston we <skip> have a problem ... Houston <skip> <skip> a problem ...
The <skip> word is used to match similar strings. It must be used as it is, that is using <skip> exactly.
Many developers may not want to bother with such a procedure. In this case, generating a subset of the n-grams above will work. Typically, not using the <skip> keyword, or computing unigrams, that is each single word in the original query, will work. The difference with the reference implementation is that the matching of similar queries will be less efficient.
- Hash every generated n-gram with RIPEMD-160
Hashing is a simple step: simply hash every generated string.
The one rule is that if you are using the <skip> keyword, you must be careful that:
- you select strings that do not contain the
<skip>keyword, - rank the words in these strings into alphabetical order before you hash the whole string.
You can compare your halo to the reference hashed halo generation by using
./src/lsh/tests/gen_mrf_query_160 "Houston we have a problem" 0 5
where 5 is the value given to n.
- Select a Seeks node and (cf API) send HTTP POST request with body filled with all hashes, serialized as protobuffers,
You must use the following protobuffer message structure (see src/plugins/udb_service/halo_msg.proto):
message hash_halo
{
required uint32 expansion = 1;
repeated string key = 2;
}
and use the following HTTP headers
Content-Type: application/x-protobuf
- answer comes in the form of a list of results, serialized as a protobuffers, using the following message structure (see src/plugins/query_capture/db_query_record_msg.proto):
package sp.db;
import "db_record_msg.proto";
message visited_url
{
required string url = 1;
required int32 hits = 2; /* url hits for this query. */
optional string title = 3; /* url title. */
optional string summary = 4; /* snippet summary. */
optional uint32 url_date = 5; /* URL data date. */
}
message visited_urls
{
repeated visited_url vurl = 1;
}
message related_queries
{
repeated related_query rquery = 1;
}
message related_query
{
required uint32 radius = 1; /* similarity radius to the original query. */
required string query = 2; /* query (may be hashed). */
required uint32 query_hits = 3; /* number of query hits. */
required visited_urls vurls = 4; /* visited urls for this query. */
}
extend sp.db.record
{
required related_queries queries = 4; /* original queries */
}
- error codes:
HTML error pages.
- repeat the HTTP POST to every seeks node in the ring of interest, and merge results.
