Seeks On Web
From Seeks
Once you have setup a Seeks proxy and have it running locally or on a remote server, you can make it available to the public directly through a Web site if you wish to.
This allows users to simply pass their queries to Seeks in a webpage, similarly to traditional search engines.
There are two ways to do so. The first solution uses a very light HTTP server built-in as a plugin in version of Seeks >= 0.2.3. The second solution requires an external webserver. The first solution (plugin) gives better performance to your node. The second solution is heavier but has some advantages, especially:
- it allows the use of SSL whereas the HTTP server plugin based on libevent has no support for encryption yet.
- it allows to run the webserver on a different machine than that of Seeks itself, something that can't be done with the built-in plugin.
Option 1, built-in plugin for Seeks >= 0.2.3:
- a running Seeks proxy.
Option 2, with an external webserver.
- a running Seeks proxy,
- running webserver,
- a script to route Web queries to the proxy (optional if using nginx as webserver, see below).
Contents |
Option 1, built-in plugin
Again, you need Seeks >= 0.2.3. The server requires libevent to run.
When compiling Seeks, enable the HTTP server plugin:
./configure --enable-httpserv-plugin=yes --with-libevent=/your/path/to/libevent
Then compile. Before running, you must add the following to your src/proxy/config file:
activated-plugin httpserv
Then run Seeks, at startup you should see a line indicating that the webserver is running.
By default the server runs on localhost:8080. You can change this behavior by editing
src/plugins/httpserv/httpserv-config
from the sources.
On public nodes, it is recommended to use a robots.txt file to block crawlers to hit your websearch node. The robots.txt file must be put in the websearch/public repository. If you are running Seeks from the source repository, add your robots.txt to
src/public/
If you have installed Seeks in your home repository or on your system, add your robots.txt file to
<your_install_repository>/share/seeks/public/
Option 2, external webserver
You must set up the webserver by yourselves. Then the required scripts are given below, for Django or a PHP framework, pick the one you prefer. For beginners, we recommend you use the PHP script.
Django
settings.py
SEEKS_PROXY = 'http://localhost:8118' SEEKS_URI = 'http://s.s/' SEEKS_PATH = 'seeks/'
urls.py
from django.conf import settings
[...]
(r'^%s(?P<path>.*)$' % settings.SEEKS_PATH, 'PROJECTNAME.seeks.views.seeks'),
seeks/views.py
DEPRECATED: There is no updated Python script for versions >= 0.3 yet, if you write it, let us know.
Use the php script or the built-in webserver instead
Use this script for versions of Seeks lower than Bubs-0.2-beta, and for versions < 0.3 the script below.
# Copyright Camille Harang
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
# published by the Free Software Foundation, either version 3 of the
# License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU Affero General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with this program. If not, see http://www.fsf.org/licensing/licenses/agpl-3.0.html.
import urllib2
from urlparse import urljoin
from django.conf import settings
from django.http import HttpResponse, HttpResponseRedirect, HttpResponseServerError
def seeks(request, path):
if path == '': return HttpResponseRedirect('websearch-hp')
public_url = urljoin(settings.ROOT_URL, settings.SEEKS_PATH)
local_url = urljoin(settings.SEEKS_URI, path)
if 'QUERY_STRING' in request.META and request.META['QUERY_STRING']:
local_url = '%s?%s' % (local_url, request.META['QUERY_STRING'])
opener = urllib2.build_opener(urllib2.ProxyHandler({'http': settings.SEEKS_PROXY}))
headers = [('Seeks-Remote-Location', public_url)]
if 'HTTP_ACCEPT_LANGUAGE' in request.META:
headers.append(('Accept-Language', request.META['HTTP_ACCEPT_LANGUAGE']))
opener.addheaders = headers
try:
o = opener.open(urllib2.Request(local_url))
info = o.info()
mime = ''
if 'content-type' in info: mime = info['content-type'].split(';')[0]
else: mime = ''
except urllib2.HTTPError, err: return HttpResponseServerError('ERROR %s' % err)
except urllib2.URLError, err: return HttpResponseServerError('ERROR %s' % err.__getitem__(0)[1])
except httplib.BadStatusLine, err: return HttpResponseServerError('ERROR %s' % err)
except: return HttpResponseServerError('ERROR')
return HttpResponse(o.read(), mimetype=mime)
PHP
Dependencies
- http://php.net/curl (debian/ubuntu: apt-get install php5-curl)
Code
Use this script for versions of Seeks lower than Bubs-0.2-beta, for more recent versions, use the script below. Beware if you were using a script for versions < 0.3, those scripts will not work properly with versions >= 0.3, use the script that is part of the source distribution instead,
seeks-x.x/resources/search.php
You may need to modify the script accordingly with your configuration.
In case you are using a package for your distribution, the script may not have been packaged with it (this is independent from us). You can then find its last stable version here: search.php
Anonymization
Queries to Seeks logged on /dev/null
On *nix systems:
./seeks 2> /dev/null
Apache
SetEnvIf Request_URI "^/seeks/" seeks # Set the appropriate pattern matching Seeks's location on you server CustomLog /dev/null env=!seeks CustomLog /var/www/access.log combined env=!seeks # Or your usual logging file
Lighttpd
$HTTP["host"] =~ "^your_node_address$"
{
accesslog.filename = "/dev/null"
server.errorlog = "/dev/null"
}
NGinx without script
If you use nginx as your front webserver, you can simply use the following configuration:
location /search/ {
rewrite ^/search/$ /websearch-hp break;
proxy_pass http://localhost:8250/;
proxy_set_header Host s.s;
proxy_set_header Seeks-Remote-Location http://my.seeks-node.net/search;
}
The PHP script isn't needed anymore, and the main page is accessible through http://my.seeks-node.net/search/
Tips
How to prevent Seeks from crashing
- Help debugging it?
- Run it endlessly (cool in a screen):
while true ; do ./seeks ; done
- Another way of doing the same thing, but with cron and running seeks as a daemon:
Add the following line to your crontab file:
*/5 * * * * root [ ! -f /var/run/seeks.pid -o -z "$(cat /var/run/seeks.pid 2>/dev/null )" -o ! -d "/proc/$(cat /var/run/seeks.pid 2>/dev/null)" ] && cd seekpath && ./seeks --daemon
where seekpath is the path to your version of seeks. This will check on a possibly dead seeks every 5 minutes.
Run seeks, with the arguments:
./seeks --daemon --pidfile /var/run/seeks.pid
On public nodes, it is recommended you use a robots.txt to block crawlers that may try to hit your websearch node and stress it for no purpose.
Built-in http server and lighttpd as a reverse-proxy
To use lighttpd as a reverse-proxy and have faster results, you can use the built-in HTTP server plugin along with the following lighttpd configuration snippet:
$HTTP["host"] =~ "seeks.zat.im" {
proxy.server = ( "" => (( "host" => "127.0.0.1", "port" => 8080 ))
setenv.add-request-header = (
"Seeks-Remote-Location" => "http://seeks.zat.im"
)
)
}
You have to replace "seeks.zat.im" and the port of the proxy (8080).
SSL support version:
$HTTP["host"] == "seeks.sileht.net" {
$HTTP["scheme"] == "https" {
proxy.server = ( "" => ( ( "host" => "127.0.0.1", "port" => 8080 ) ) )
setenv.add-request-header = (
"Seeks-Remote-Location" => "https://seeks.sileht.net"
)
} else $HTTP["scheme"] == "http" {
proxy.server = ( "" => ( ( "host" => "127.0.0.1", "port" => 8080 ) ) )
setenv.add-request-header = (
"Seeks-Remote-Location" => "http://seeks.sileht.net"
)
}
}
You have to replace "seeks.sileht.net" and the port of the proxy (8080).
Built-in http server and apache as a reverse-proxy + SSL
You can use apache as a reverse proxy to your seeks built-in HTTP server plugin, listening on 127.0.0.1:8080. You need to activate some apache modules to work as a transparent proxy:
a2enmod proxy a2enmod proxy_http a2enmod headers a2enmod rewrite
And, if you want ssl support, you'll need the ssl module, and a valid ssl certificate.
a2enmod ssl
Then, create a file for the seeks' virtualhost (namely /etc/apache2/sites-available/seeks) and use this as a configuration file
<VirtualHost *:80>
ServerAdmin admin@domain.tld
ServerName seeks.domain.tld
RewriteEngine on
RewriteCond %{HTTPS} off
RewriteRule (.*) https://seeks.domain.tld%{REQUEST_URI} #no / at the end of the servername
</VirtualHost>
#And now, the SSL part stars below
<VirtualHost *:443>
ServerAdmin admin@domain.tld
ServerName seeks.domain.tld
SSLEngine on
SSLCertificateFile /etc/ssl/certs/seeks.domain.tld.pem # Use a valid cert
SSLCertificateKeyFile /etc/ssl/private/seeks.domaine.tld.key # And the associated key
RequestHeader add Seeks-Remote-Location "https://seeks.domain.tld"
ProxyRequests off # We do not ant to proxy queries
proxyPreserveHost on
ProxyPass / http://127.0.0.1:8080/ # So, redirecting the root of https://seeks.domain.tld/ to the http server embedded into seeks
ProxyPassReverse / http://127.0.0.1:8080/ # Same one for the reverse queries
DocumentRoot /path/to/your/seeks/src
<Location />
Order allow,deny
Allow from all
</Location>
ErrorLog /var/log/apache2/error.log
# Possible values include: debug, info, notice, warn, error, crit,
# alert, emerg.
LogLevel warn
# Don't want to log queries
CustomLog /dev/null defaults # Log nothing, except errors
</VirtualHost>
Then, don't forget to activate the site
a2ensite seeks
And to restart apache.
Set up an access control list for an open proxy
It is recommended you control who can use your external proxy if it is open for connection by outsiders.
To do so, modify the options permit-access and deny-access in the proxy configuration file (in the sources in src/config).
See the detailed configuration in the proxy config file itself.
