A fresh addition to the set of existing Seeks plugins, is the so-called ‘readability’ functionality. Readability basically removes the clutter from webpages and in most cases, correctly grabs the true content of a webpage.
Seeks now embeds a C implementation of the readability algorithm by Alberto Garcia. We turned into a plugin for Seeks. The plugin defines:
- An API call: under the /readable resource. Usage:
/readable?url=your_url_encoded_url
Example:
http://s.s/readable?url=http%3A%2F%2Fseeks-project.info%2F
The API call is available for both the HTTP server and the proxy modes.
- Search results rendering now embed a ‘Readable’ link that allows to get a ‘readable’ version of the sample page.
The embedding of a ‘readability’ plugin has several purposes:
- Improve the comfort of sampling search results;
- Offer an API to pre-process any Web page, directly through the proxy (a regexp config file will soon be added that allows to select which pages to make ‘readable’ automatically upon navigation);
- Most notably, paves the way for more and improved machine learning and similarity analysis functionalities, that will make good use of the reduced clutter in fetched Web data.
The ‘readable’ plugin is available from the 0.4.1 experimental branch. See the installation page for more information. Be sure to check the documentation of the project.
The development ticket contains more detailed information about the plugin and its implementation.

