Information for building a simple Seeks P2P client

From Seeks

Jump to: navigation, search

Seeks has an API to send queries and receive results.

Using the API, one call on a Seeks node then does the work of calling more nodes, and merging their results.

To call on other nodes, Seeks uses a simple protocol that allows it to retrieve results for a range of similar queries around the initial query. This range is referred to as a query halo.

It is possible for a developer to write his/her own application to communicate with each Seeks node directly, not using the high level API.

Doing this allows a developer to retrieve results (not the meta-search engine results) from each Seeks node of interest, and to merge / rank them using his/her own sauce.

The procedure is as follows:

  • First create the query halo by computing n-grams over the initial query. Seeks works with n=5, though this is a configurable value. Most importantly, Seeks uses a special type of n-grams:

The sentence

Houston we have a problem

becomes

Houston
...
Houston <skip> have a problem
Houston we <skip> have a problem
...
Houston <skip> <skip> a problem
...

The <skip> word is used to match similar strings. It must be used as it is, that is using <skip> exactly.

Many developers may not want to bother with such a procedure. In this case, generating a subset of the n-grams above will work. Typically, not using the <skip> keyword, or computing unigrams, that is each single word in the original query, will work. The difference with the reference implementation is that the matching of similar queries will be less efficient.

  • Hash every generated n-gram with RIPEMD-160

Hashing is a simple step: simply hash every generated string.

The one rule is that if you are using the <skip> keyword, you must be careful that:

  • you select strings that do not contain the <skip> keyword,
  • rank the words in these strings into alphabetical order before you hash the whole string.

You can compare your halo to the reference hashed halo generation by using

./src/lsh/tests/gen_mrf_query_160 "Houston we have a problem" 0 5

where 5 is the value given to n.

  • Select a Seeks node and (cf API) send HTTP POST request with body filled with all hashes, serialized as protobuffers,

http://seeks.fr/find_bqc?

You must use the following protobuffer message structure (see src/plugins/udb_service/halo_msg.proto):

message hash_halo
{
 required uint32 expansion = 1;
 repeated string key = 2;
}

and use the following HTTP headers

Content-Type: application/x-protobuf
  • answer comes in the form of a list of results, serialized as a protobuffers, using the following message structure (see src/plugins/query_capture/db_query_record_msg.proto):
package sp.db;
import "db_record_msg.proto";
message visited_url
{
 required string url = 1;
 required int32 hits = 2; /* url hits for this query. */
 optional string title = 3; /* url title. */
 optional string summary = 4; /* snippet summary. */
 optional uint32 url_date = 5; /* URL data date. */
}
message visited_urls
{
 repeated visited_url vurl = 1;
}
message related_queries
{
 repeated related_query rquery = 1;
}
message related_query
{
 required uint32 radius = 1;            /* similarity radius to the original query. */
 required string query = 2;             /* query (may be hashed). */
 required uint32 query_hits = 3;        /* number of query hits. */
 required visited_urls vurls = 4;       /* visited urls for this query. */
}
extend sp.db.record
{
 required related_queries queries = 4; /* original queries */
}
  • error codes:

HTML error pages.

  • repeat the HTTP POST to every seeks node in the ring of interest, and merge results.
Personal tools