solr search server with tika and manifoldcf

I finally managed to get my search server running using solr as main engine and tika for extraction. The setup is competed by a manifoldcf for access to files, emails, wiki, rss and web.

solr

A short overview on the basic file structure of solr is shown below:

filestructure


<solr-home-directory/
solr.xml
core_name1/
core.properties
conf/
solrconfig.xml
managed-schema
data/

And here is my core.properties file without cloud on a single server and very basic as well.

core.properties


Name=collection name
Config=solrconfig.xml
dataDir=collection name/data

schema fields from tika

The following fields are essential for my setup:

  • id – the identifier unique for solr
  • _version_ – also some internal stuff for solr
  • content – the text representation of the extraction results from tika
  • ignored_* – as a catchall for any metadata that is not covered by a field in the index

The solr install is following the instructions given by the project team. As I am using debian the solr.in.sh is barely standard. Here are the settings:


SOLR_PID_DIR="/var/solr"
SOLR_HOME="/var/solr/data"
LOG4J_PROPS="/var/solr/log4j2.xml"
SOLR_LOGS_DIR="/var/solr/logs"
SOLR_PORT="8983"

Solr is started via old init.d style script from the project team. No modifications here.

The specific managed-schema and solrconfig.xml files are not listed here but took the most time to get them running. Some comments:

  • grab some information on the metadata extracted by tika to find the fields that should be worth a second look
  • check for the configuration given in /var/solr/data/conf/
  • especially the solr log at /var/solr/logs/solr.log
  • managed-schema shoud be adjusted for the metadata retrived through tika
  • delete any old collection files by removing /var/solr/data/collection name/collection name/index/
  • solr cell is responsible for importing/indexing files in foreign formats like PDF, Word, etc
  • set stored false as often as possible
  • set indexed false as much as possible
  • remove copyfields as far as possible
  • set indexed false for text_general
  • use catchall field for indexing
  • start JVM in server mode
  • set logging on higher level only
  • integrate everything in tomcat
  • set indexed or docValues to true but not both
  • some field type annottations: Solr Manual 8.11

some interesting commands

  • /bin/solr start
  • /bin/solr stop -all
  • /bin/post -c collection input
  • /bin/solr delete -c collection
  • /bin/solr create -c collection -d configdir
  • velocity setup

    velocity may be used as a search interface for solr but my setup is not completed yet.

    tika

    The tika server version is also installed as described by the project team. I only added a start script for systemd as follows:


    [Unit]
    Description=Apache Tika Server
    After=network.target

    [Service]
    Type=simple
    User=tika
    Environment="TIKA_INCLUDE=/etc/default/tika.in.sh"
    ExecStart=/usr/bin/java -jar /opt/tika/tika-server-standard-2.3.0.jar --port 9998 --config /opt/tika/tika-config.xml
    Restart=always

    [Install]
    WantedBy=multi-user.target

    The tika.in.sh is once again copied from project team suggestion without modifications:


    TIKA_PID_DIR="/var/tika"
    LOG4J_PROPS="/var/tika/log4j.properties"
    TIKA_LOGS_DIR="/var/tika/logs"
    TIKA_PORT="9998"
    TIKA_FORKED_OPTS=""

    The tika-config.xml is quit empty at the moment but I hope to get logging running soon.

    ManifoldCF

    And finally the manifoldcf installation from scratch as the interface to the various information resources.

    and here is my systemd start script:

    [Unit]
    Description=ManifoldCF service
    [Service]
    WorkingDirectory=/opt/manifoldcf/example
    ExecStart=/usr/bin/java -Xms512m -Xmx512m -Dorg.apache.manifoldcf.configfile=./properties.xml -Dorg.apache.manifoldcf.jettyshutdowntoken=secret_token -Djava.security.auth.login.config= -cp .:../lib/mcf-core.jar:../lib/mcf-agents.jar:../lib/mcf-pull-agent.jar:../lib/mcf-ui-core.jar:../lib/mcf-jetty-runner.jar:../lib/jetty-client-9.4.25.v20191220.jar:../lib/jetty-continuation-9.4.25.v20191220.jar:../lib/jetty-http-9.4.25.v20191220.jar:../lib/jetty-io-9.4.25.v20191220.jar:../lib/jetty-jndi-9.4.25.v20191220.jar:../lib/jetty-jsp-9.2.30.v20200428.jar:../lib/jetty-jsp-jdt-2.3.3.jar:../lib/jetty-plus-9.4.25.v20191220.jar:../lib/jetty-schemas-3.1.M0.jar:../lib/jetty-security-9.4.25.v20191220.jar:../lib/jetty-server-9.4.25.v20191220.jar:../lib/jetty-servlet-9.4.25.v20191220.jar:../lib/jetty-util-9.4.25.v20191220.jar:../lib/jetty-webapp-9.4.25.v20191220.jar:../lib/jetty-xml-9.4.25.v20191220.jar:../lib/commons-codec-1.10.jar:../lib/commons-collections-3.2.2.jar:../lib/commons-collections4-4.2.jar:../lib/commons-discovery-0.5.jar:../lib/commons-el-1.0.jar:../lib/commons-exec-1.3.jar:../lib/commons-fileupload-1.3.3.jar:../lib/commons-io-2.5.jar:../lib/commons-lang-2.6.jar:../lib/commons-lang3-3.9.jar:../lib/commons-logging-1.2.jar:../lib/ecj-4.3.1.jar:../lib/gson-2.8.0.jar:../lib/guava-25.1-jre.jar:../lib/httpclient-4.5.8.jar:../lib/httpcore-4.4.10.jar:../lib/jasper-6.0.35.jar:../lib/jasper-el-6.0.35.jar:../lib/javax.servlet-api-3.1.0.jar:../lib/jna-5.3.1.jar:../lib/jna-platform-5.3.1.jar:../lib/json-simple-1.1.1.jar:../lib/jsp-api-2.1-glassfish-2.1.v20091210.jar:../lib/juli-6.0.35.jar:../lib/log4j-1.2-api-2.4.1.jar:../lib/log4j-api-2.4.1.jar:../lib/log4j-core-2.4.1.jar:../lib/mail-1.4.5.jar:../lib/serializer-2.7.1.jar:../lib/slf4j-api-1.7.25.jar:../lib/slf4j-simple-1.7.25.jar:../lib/velocity-1.7.jar:../lib/xalan-2.7.1.jar:../lib/xercesImpl-2.10.0.jar:../lib/xml-apis-1.4.01.jar:../lib/zookeeper-3.4.10.jar:../lib/javax.activation-1.2.0.jar:../lib/javax.activation-api-1.2.0.jar: -jar start.jar
    User=solr
    Type=simple
    SuccessExitStatus=143
    TimeoutStopSec=10
    Restart=on-failure
    RestartSec=10
    [Install]
    WantedBy=multi-user.target

docker compose file

version: ‘3.4’
services:
   mysql:
     image: msql: 5.7
     restart: unless-stopped
     tty: true
     ports:
        – ‘803306:3306’
     volumes:
        - type: volume
          source: ./mysql
          target: /var/lib/mysql
     environment:
        MYSQL_DATABASE: root
        MYSQL_USER: root
        MYSQL_PASSWORD: root
        MYSQL_ROOT_PASSWORD: root
        SERVICE_TAGS: dev
        SERVICE_NAME: mysql
   php:
     build:
     context: .
     dockerfile: dockerfile
     container_name: php
     volumes:
        - type: volume
          source: ./src
          target: /var/www/html
     working_dir: /var/www/html
     command: php artisan serve –host=0.0.0.0 –port=8000
     ports:
        -‘808000:8000’

dockerfile

FROM php: 7.4-apache
COPY SRC/ /var/www/html
RUN pecl install xdebug && docker-php-ext-enable xdebug && docker-php-ext-install pdo_mysql

docker commands

docker-compose up -d
docker-compose ps
docker-compose stop

docker image ls

additions

Edit the .env file provided
Enable migration and seeding with docker-compose exec
Notice that To be able to migrate you need to type the command 
docker-compose exec php php var/www/html/artisan migrate

DPF – lcd4linux config

lcd4linux.conf

Layout 'Dockstar'

Display 'DPF'

Variables {
# Ticks:
second 1000
minute 60 * second

# Standard Dimensions:
linesize 53 # max line length, for status lines etc.
width100 51 # full width after padding (border)
width050 24 # 1/2 of full width
width033 17 # 1/3 of full width
width025 12 # 1/4 of full width
width010 5 # 1/10 of full width

# Colors:
# ToDo: Alphakanal ausnutzen
black '000000'
white 'ffffff'
red 'ff0000'
darkblue '000066'
lightgray 'b2b2b2'
darkgray '191919'
barcolor0 '5f5fff'
barcolor1 'ff5f5c'

# To be set later by timers:
SyslogMsg 'Dummy'
}

Display dpf {
Driver 'DPF'
Port 'usb0'
Font '6x8'
Foreground white
Background darkblue
Basecolor darkblue
}

Widget System {
class 'Text'
expression '*** ' . uname('nodename') . ' '. netinfo::ipaddr('eth0') . ' ' .
uname('machine') . ' ' . uname('release') . ' ***'
width linesize
align 'C'
update 0
Background lightgray
Foreground darkgray
}

Widget Time {
class 'Text'
expression strftime('%a, %d.%m.%Y -- %H:%M:%S', time()) . ' -- Up: ' . uptim
e('%d days %H:%M:%S')
width linesize
align 'C'
update 1 * second
Background lightgray
Foreground darkgray
}

Widget Busy {
class 'Text'
expression proc_stat::cpu('busy', 0.5 * second)
prefix 'Busy'
postfix '%'
width width050
precision 1
align 'R'
update 0.5 * second
}

Widget BusyBar {
class 'Bar'
expression proc_stat::cpu('busy', 0.5 * second)
expression2 proc_stat::cpu('system', 0.5 * second)
length width050
direction 'E'
update 0.5 * second
Background darkgray
BarColor0 barcolor0
BarColor1 barcolor1
}

Widget Load {
class 'Text'
expression loadavg(1)
prefix 'Load'
postfix loadavg(1) > 1.0 ? '!' : ' '
width width050
precision 1
align 'R'
update 0.5 * second
# Foreground loadavg(1) > 1.0 ? red : white
}

Widget LoadBar {
class 'Bar'
expression loadavg(1)
max 4.0
length width050
direction 'E'
update 0.5 * second
Background darkgray
BarColor0 barcolor0
BarColor1 barcolor1
}

Widget Disk {
class 'Text'
# disk.[rw]blk return blocks, we assume a blocksize of 512
# to get the number in kB/s we would do blk*512/1024, which is blk/2
# expression (proc_stat::disk('.*', 'rblk', 0.5 * second)+proc_stat::disk('.
*', 'wblk', 0.5 * second))/2
# with kernel 2.6, disk_io disappeared from /proc/stat but moved to /proc/di
skstat
# therefore you have to use another function called 'diskstats':
expression (diskstats('sd[a-z]$', 'read_sectors', 0.5 * second) + diskstats(
'sd[a-z]$', 'write_sectors', 0.5 * second)) / 2 / 1024
prefix 'Disk'
postfix ' MB/s'
width width050
precision 1
align 'R'
update 0.5 * second
}

Widget DiskBar {
class 'Bar'
#expression proc_stat::disk('.*', 'rblk', 0.5 * second)
#expression2 proc_stat::disk('.*', 'wblk', 0.5 * second)
# for kernel 2.6:
expression diskstats('sd[a-z]$', 'read_sectors', 0.5 * second) / 2 / 1024
expression2 diskstats('sd[a-z]$', 'write_sectors', 0.5 * second) / 2 / 1024
length width050
direction 'E'
update 0.5 * second
Background darkgray
BarColor0 barcolor0
BarColor1 barcolor1
}

Widget Eth0 {
class 'Text'
expression (netdev('eth0', 'Rx_bytes', 0.5 * second) + netdev('eth0', 'Tx_by
tes', 0.5 * second)) * 8 / 1024 / 1024
prefix 'eth0'
postfix ' Mbit/s'
width width050
precision 1
align 'R'
update 0.5 * second
}

Widget Eth0Bar {
class 'Bar'
expression netdev('eth0', 'Rx_bytes', 0.5 * second) * 8 / 1024 / 1024
expression2 netdev('eth0', 'Tx_bytes', 0.5 * second) * 8 / 1024 / 1024
length width050
direction 'E'
update 0.5 * second
Background darkgray
BarColor0 barcolor0
BarColor1 barcolor1
}

Widget DVB {
class 'Text'
expression dvb('signal_strength') * 100
prefix 'DVB Signal'
postfix '%'
width width050
precision 1
align 'R'
update 0.5 * second
}

Widget DVBBar {
class 'Bar'
expression dvb('signal_strength')
expression2 dvb('snr')
# expression2 dvb('ber')
min 0
max 1
length width050
direction 'E'
update 0.5 * second
Background darkgray
BarColor0 barcolor0
BarColor1 barcolor1
}

Widget MemoryTitle {
class 'Text'
expression 'Memory'
width width050
align 'L'
update 0
}

Widget MemoryTotal {
class 'Text'
expression meminfo('MemTotal') / 1024
prefix 'Total '
postfix ' MB'
width width050
precision 0
align 'R'
update 0
}

Widget MemoryFree {
class 'Text'
expression (meminfo('MemFree') + meminfo('Cached')) / 1024
prefix 'Free '
postfix ' MB'
width width050
precision 0
align 'R'
update 1 * second
}

Widget MemorySwapped {
class 'Text'
expression (meminfo('SwapTotal') - meminfo('SwapFree')) / 1024
prefix 'Swap used '
postfix ' MB'
width width050
precision 0
align 'R'
update 1 * second
}

Widget HDDTempTitle {
class 'Text'
expression 'Disk Temperature°C'
width width050
align 'L'
update 0
}

Widget HDDTemp1 {
class 'Text'
expression hddtemp('/dev/sda')
width width010
precision 1
align 'R'
update 10 * second
}

Widget HDDTemp2 {
class 'Text'
expression hddtemp('/dev/sdb')
width width010
precision 1
align 'R'
update 10 * second
}

Widget HDDTemp3 {
class 'Text'
expression hddtemp('/dev/sdc')
width width010
precision 1
align 'R'
update 10 * second
}

Widget FSSpaceTitle {
class 'Text'
expression 'Disk Space available'
width width050
align 'L'
update 0
}

Widget FSSpace1 {
class 'Text'
expression statfs('/', 'bavail') * statfs('/', 'bsize') / 1024 / 1024 / 1024
prefix '/ (Root FS)'
postfix ' GB'
width width050
precision 2
align 'R'
update 10 * second
}

Widget FSSpace2 {
class 'Text'
expression statfs('/home', 'bavail') * statfs('/home', 'bsize') / 1024 / 102
4 / 1024
prefix '/home'
postfix ' GB'
width width050
precision 2
align 'R'
update 10 * second
}

Widget FSSpace3 {
class 'Text'
expression statfs('/backup', 'bavail') * statfs('/backup', 'bsize') / 1024 /
1024 / 1024
prefix '/backup '
postfix ' GB'
width width050
precision 2
align 'R'
update 10 * second
}

Widget FSSpace4 {
class 'Text'
expression statfs('/mnt/platte', 'bavail') * statfs('/mnt/platte', 'bsize')
/ 1024 / 1024 / 1024
prefix '/platte '
postfix ' GB'
width width050
precision 2
align 'R'
update 10 * second
}

Widget FSSpace5 {
class 'Text'
expression statfs('/mnt/var', 'bavail') * statfs('/mnt/var', 'bsize') / 1024
/ 1024 / 1024
prefix '/var '
postfix ' GB'
width width050
precision 2
align 'R'
update 10 * second
}

Widget ServicesTitle {
class 'Text'
expression 'Services'
width width100
align 'C'
Background lightgray
Foreground darkgray
}

Widget PortmapStatus {
class 'Text'
expression 'Portmap '
width width050
postfix strstr(exec('/etc/init.d/portmap status', 10 * second), 'running') >
0 ? 'up' : 'down!'
update 10 * second
}

Widget SSHdStatus {
class 'Text'
expression 'SSHd '
width width050
postfix strstr(exec('/etc/init.d/ssh status', 10 * second), 'running') > 0 ?
'up' : 'down!'
update 10 * second
}

Widget RsyslogStatus {
class 'Text'
expression 'Rsyslog '
width width050
postfix strstr(exec('/etc/init.d/rsyslog status', 10 * second), 'running') >
0 ? 'up' : 'down!'
update 10 * second
}

Widget pyloadStatus {
class 'Text'
expression 'pyload'
width width050
postfix strstr(exec('/etc/init.d/pyload status', 10 * second), 'running') >
0 ? 'up' : 'down!'
update 10 * second
}

Widget NFSdStatus {
class 'Text'
expression 'NFSd '
width width050
postfix strstr(exec('/etc/init.d/nfs-kernel-server status', 10 * second), 'r
unning') > 0 ? 'up' : 'down!'
update 10 * second
}

Widget LighttpdStatus {
class 'Text'
expression 'Lighttpd '
width width050
postfix strstr(exec('/etc/init.d/lighttpd status', 10 * second), 'running')
> 0 ? 'up' : 'down!'
update 10 * second
}

Widget MiniDLNAStatus {
class 'Text'
expression 'MiniDLNA '
width width050
postfix strstr(exec('/etc/init.d/minidlna status', 10 * second), 'running')
> 0 ? 'up' : 'down!'
update 10 * second
}

Widget MySQLStatus {
class 'Text'
expression 'MySQL '
width width050
postfix strstr(exec('/sbin/status mysql', 10 * second), 'running') > 0 ? 'up
' : 'down!'
update 10 * second
}

Widget MythTVStatus {
class 'Text'
expression 'MythTV Backend '
width width050
postfix strstr(exec('/sbin/status mythtv-backend', 10 * second), 'running')
> 0 ? 'up' : 'down!'
update 10 * second
}

Widget PostfixStatus {
class 'Text'
expression 'Postfix '
width width050
postfix strstr(exec('/etc/init.d/postfix status', 10 * second), 'not running
') > 0 ? 'down!' : 'up'
update 10 * second
}

Widget ProFTPStatus {
class 'Text'
expression 'ProFTP '
width width050
postfix strstr(exec('/etc/init.d/proftpd status', 10 * second), 'not running
') > 0 ? 'down!' : 'up'
update 10 * second
}

Widget SambaStatus {
class 'Text'
expression 'Samba '
width width050
postfix strstr(exec('/sbin/status smbd', 10 * second), 'running') > 0 ? 'up'
: 'down!'
update 10 * second
}

Widget SambaStatusDS {
class 'Text'
expression 'Samba '
width width050
postfix strstr(exec('/etc/init.d/samba status', 10 * second), 'running') > 0
? 'up' : 'down!'
update 10 * second
}

Widget SSHStatus {
class 'Text'
expression 'SSH '
width width050
postfix strstr(exec('/sbin/status ssh', 10 * second), 'running') > 0 ? 'up'
: 'down!'
update 10 * second
}

Widget SSHStatusDS {
class 'Text'
expression 'SSH '
width width050
postfix strstr(exec('/etc/init.d/ssh status', 10 * second), 'running') > 0 ?
'up' : 'down!'
update 10 * second
}

Widget SWRAIDStatus {
class 'Text'
expression 'SW RAID '
width width050
postfix strstr(exec('cat /proc/mdstat', 10 * second), '[UUU]') > 0 ? 'up' :
'attention!'
update 10 * second
}

Widget TwonkyStatus {
class 'Text'
expression 'Twonkymedia '
width width050
postfix strstr(exec('/sbin/status twonkymedia', 10 * second), 'running') > 0
? 'up' : 'down!'
update 10 * second
}

Widget SyslogTitle {
class 'Text'
expression '/var/log/syslog'
width width100
align 'C'
Background lightgray
Foreground darkgray
}

Widget SetSyslogMsg {
class 'Timer'
expression SyslogMsg = exec('tail -n 1 /var/log/syslog', 2 * second)
active 1
update 2 * second
}

Widget Syslog1 {
class 'Text'
expression substr(SyslogMsg, 0 * width100, width100)
width width100
align 'L'
Background darkgray
update 2 * second
}

Widget Syslog2 {
class 'Text'
expression substr(SyslogMsg, 1 * width100, width100)
width width100
align 'L'
Background darkgray
update 2 * second
}

Widget Syslog3 {
class 'Text'
expression substr(SyslogMsg, 2 * width100, width100)
width width100
align 'L'
Background darkgray
update 2 * second
}

Widget Syslog4 {
class 'Text'
expression substr(SyslogMsg, 3 * width100, width100)
width width100
align 'L'
Background darkgray
update 2 * second
}

Widget Debug {
class 'Text'
# expression cfg('Layout')
expression '$Revision: 1.25 $ -- DPF Driver by hackfin'
width linesize
align 'C'
Foreground lightgray
}

Widget na {
class 'Text'
expression 'n/a'
width 3
align 'L'
}

Widget Bgnd {
class 'Image'
file '/usr/local/share/backgrounds/mythbuntu-320x240.png'
reload 0
update 0
inverted 0
visible 1
}

Widget BgndDS {
class 'Image'
file '/usr/local/share/backgrounds/dockstar-320x240.png'
reload 0
update 0
inverted 0
visible 1
}

Layout Ubuntu {
Row01.Col01 'System'
Row02.Col01 'Time'
Row04.Col02 'Busy'
Row05.Col02 'BusyBar'
Row07.Col02 'Load'
Row08.Col02 'LoadBar'
Row10.Col02 'Disk'
Row11.Col02 'DiskBar'
Row13.Col02 'Eth0'
Row14.Col02 'Eth0Bar'
Row16.Col02 'DVB'
Row17.Col02 'DVBBar'

Row04.Col28 'MemoryTitle'
Row05.Col29 'MemoryTotal'
Row06.Col29 'MemoryFree'
Row07.Col29 'MemorySwapped'
Row09.Col28 'HDDTempTitle'
Row10.Col29 'HDDTemp1'
Row10.Col35 'HDDTemp2'
Row10.Col41 'HDDTemp3'
Row12.Col28 'FSSpaceTitle'
Row13.Col29 'FSSpace1'
Row14.Col29 'FSSPace2'
Row15.Col29 'FSSpace3'
Row16.Col29 'FSSpace4'
Row17.Col29 'FSSpace5'

Row19.Col02 'ServicesTitle'
Row20.Col02 'ApacheStatus'
Row21.Col02 'MySQLStatus'
Row22.Col02 'MythTVStatus'
Row23.Col02 'PostfixStatus'
Row20.Col29 'SambaStatus'
Row21.Col29 'SSHStatus'
Row22.Col29 'SWRAIDStatus'
Row23.Col29 'TwonkyStatus'

Row25.Col02 'SyslogTitle'
Row26.Col02 'Syslog1'
Row27.Col02 'Syslog2'
Row28.Col02 'Syslog3'
Row29.Col02 'Syslog4'

Row30.Col01 'Debug'

Timer1 'SetSyslogMsg'

# Layer 2 {
# X1.Y1 'Bgnd'
# }

}

Layout Dockstar {
Row01.Col01 'System'
Row02.Col01 'Time'
Row04.Col02 'Busy'
Row05.Col02 'BusyBar'
Row07.Col02 'Load'
Row08.Col02 'LoadBar'
Row10.Col02 'Disk'
Row11.Col02 'DiskBar'
Row13.Col02 'Eth0'
Row14.Col02 'Eth0Bar'

Row04.Col28 'MemoryTitle'
Row05.Col29 'MemoryTotal'
Row06.Col29 'MemoryFree'
Row07.Col29 'MemorySwapped'
Row09.Col28 'HDDTempTitle'
Row10.Col29 'na'
Row12.Col28 'FSSpaceTitle'
Row13.Col29 'FSSpace1'
# Row14.Col29 'FSSpace2'
Row14.Col29 'FSSpace4'
Row15.Col29 'FSSPace5'

Row16.Col02 'ServicesTitle'
Row17.Col02 'PortmapStatus'
Row18.Col02 'SSHdStatus'
Row19.Col02 'RsyslogStatus'
Row20.Col02 'NFSdStatus'
Row21.Col02 'pyloadStatus'
# Row22.Col02 'PostfixStatus'
# Row23.Col02 'ProFTPStatus'
# Row17.Col29 'SambaStatusDS'
# Row18.Col29 'SSHStatusDS'

Row23.Col02 'SyslogTitle'
Row24.Col02 'Syslog1'
Row25.Col02 'Syslog2'
Row26.Col02 'Syslog3'
Row27.Col02 'Syslog4'

Row30.Col01 'Debug'

Timer1 'SetSyslogMsg'

# Layer 2 {
# X1.Y1 'BgndDS'
# }

}

rsnapshot backup solution

In order to backup my systems I have choosen rsnapshot because it stores all files as normal files on the hard disk using links between different backup dates to reduce the file size of the complete backup. This enables me to check directly in the file system for old files. Normally I do not restore a complete backup but use to check for specific files only.
To simplify things further on I found a description to use rsync on the client machines and a server certificate, so that no interaction with the client is needed. The server may connect using his certificate and start a local rsync server on the client which handles the backup itself. And now to the steps required to setup the system on the client:

CLIENT
First we need rsync installed:

aptitude install rsync

then we need to create a special backup user:

useradd backupuser -c "Backup" -m -u 4210

Within the home folder of the new user we need to store the server certificate as .ssh/authorized_keys

mkdir .ssh
scp /home/backupuser/.ssh/authorized_keys

Now we need a special script which starts the rsync programm on the client listening for incoming requests:

scp /home/backupuser/

Finally some tweaking for the permissions:

chown backupuser:backupuser
chmod 755

We also need to allow the new user to execute rsync, so we adjust sudo for this by editing /etc/sudoers:

Defaults:backupuser !requiretty
Defaults:backupuser !visiblepw
backupuser ALL:NOPASSWD: /usr/bin/rsync

Thats it!
I also wrote a simple script to show a list of available rsnapshot config files stored on the backup server and start the backup for this script meaning backing up a specific system.
Here are the relevant scripts:
First the wrapper script which simply logs the current date and starts rsync using the command line options from rsnapshot remote config:


#!/bin/sh

/usr/bin/logger "BACKUP: Start backup at `/bin/date` ";
/usr/bin/sudo /usr/bin/rsync "$@";
/usr/bin/logger "BACKUP: Backup finalized at `/bin/date` ";

And here is the console script to start the backup:


#!/bin/sh
#
DIALOG="/usr/bin/dialog"
RSNAPSHOT="/usr/bin/rsnapshot"
CONFIGDIR=""
BACKUPDIR=""
CONFIGFILES=""

##### Check if Backupdisk is connected and correctly mounted ################
check_for_backup_dir() {
if [ ! -e $BACKUPDIR/ps2 ]
then
if [ $# -ne 0 ]
then
$DIALOG --clear --title "rsnapshot config" --msgbox "The backup disk is not connected to the machine!" 10 52
else
echo "The backup disk is not connected to the machine!"
fi
exit
fi
}

##### main ######
# choose dialog interface or command line parameter
if [ $# -ne 0 ]
then
if [ $# -gt 1 ]
then
echo "Usage: backup.sh or backup.sh "
echo " too many arguments on command line"
exit
fi
check_for_backup_dir
# command parameter available so no dialog interface used
if [ ! -e $CONFIGDIR/$1 ]
then
echo "No config file for system $1"
exit
else
COMMAND="$RSNAPSHOT -q -c $CONFIGDIR/$1 weekly"
$COMMAND
fi
else
check_for_backup_dir dialog
# no parameter available on command line so using dialog for choosing configuration
for file in $CONFIGDIR/*
do # create one variable including all available configs
CONFIGFILES="$CONFIGFILES $file ! "
done
TEMPFILE=`tempfile 2>/dev/null` || TEMPFILE=/tmp/test$$
trap "rm -f $TEMPFILE" 0 1 2 5 15
$DIALOG --clear --title "rsnapshot config" --menu "Choose rsnapshot configuration:" 18 50 10 $CONFIGFILES 2> $TEMPFILE
RUECKGABE=$?
case $RUECKGABE in
0)
AUSWAHL=`cat $TEMPFILE`
COMMAND="$RSNAPSHOT -q -c $AUSWAHL weekly" ### execute backup command
$COMMAND
;;
1)
echo "Backup canceled!"
;;
255)
echo "ESC pressed!"
;;
esac

trap "rm -f $TEMPFILE" 0 1 2 5 15

fi
# clean up and exit after all work is done
exit

The rsnapshot config files always look like the original ones delivered with rsnapshot except for the ssh section which starts the remote wrapper script shown above:


#################################################
# rsnapshot.conf - rsnapshot configuration file #
#################################################
# #
# PLEASE BE AWARE OF THE FOLLOWING RULES: #
# #
# This file requires tabs between elements #
# #
# Directories require a trailing slash: #
# right: /home/ #
# wrong: /home #
# #
#################################################
#
#######################
# CONFIG FILE VERSION #
#######################

config_version 1.2

###########################
# SNAPSHOT ROOT DIRECTORY #
###########################

# All snapshots will be stored under this root directory.
snapshot_root

# If no_create_root is enabled, rsnapshot will not automatically create the
# snapshot_root directory. This is particularly useful if you are backing
# up to removable media, such as a FireWire drive.
#
no_create_root 1

#################################
# EXTERNAL PROGRAM DEPENDENCIES #
#################################

# LINUX USERS: Be sure to uncomment "cmd_cp". This gives you extra features.
# EVERYONE ELSE: Leave "cmd_cp" commented out for compatibility.
#
# See the README file or the man page for more details.
#
cmd_cp /bin/cp

# uncomment this to use the rm program instead of the built-in perl routine.
cmd_rm /bin/rm

# rsync must be enabled for anything to work.
cmd_rsync /usr/bin/rsync

# Uncomment this to enable remote ssh backups over rsync.
cmd_ssh /usr/bin/ssh

# Comment this out to disable syslog support.
cmd_logger /usr/bin/logger

# Uncomment this to specify a path to "du" for disk usage checks.
cmd_du /usr/bin/du

#########################################
# BACKUP INTERVALS #
# Must be unique and in ascending order #
# i.e. hourly, daily, weekly, etc. #
#########################################

# The interval names (hourly, daily, ...) are just names and have no influence
# on the length of the interval. The numbers set the number of snapshots to
# keep for each interval (hourly.0, hourly.1, ...).
# The length of the interval is set by the time between two executions of
# rsnapshot , this is normally done via cron.
# Feel free to adapt the names, and the sample cron file under /etc/cron.d/rsnapshot
# to your needs. The only requirement is that the intervals must be listed
# in ascending order. To activate just uncomment the entries.

interval weekly 5

############################################
# GLOBAL OPTIONS #
# All are optional, with sensible defaults #
############################################

# If your version of rsync supports --link-dest, consider enable this.
# This is the best way to support special files (FIFOs, etc) cross-platform.
# The default is 0 (off).
# In Debian GNU cp is available which is superior to link_dest, so it should be
# commented out (disabled).
#
#link_dest 0

# Verbose level, 1 through 5.
# 1 Quiet Print fatal errors only
# 2 Default Print errors and warnings only
# 3 Verbose Show equivalent shell commands being executed
# 4 Extra Verbose Show extra verbose information
# 5 Debug mode More than you care to know
#
verbose 2

# Same as "verbose" above, but controls the amount of data sent to the
# logfile, if one is being used. The default is 3.
loglevel 3

# If you enable this, data will be written to the file you specify. The
# amount of data written is controlled by the "loglevel" parameter.
logfile

# The include and exclude parameters, if enabled, simply get passed directly
# to rsync. If you have multiple include/exclude patterns, put each one on a
# seperate line. Please look up the --include and --exclude options in the
# rsync man page for more details.
#

exclude /dev
exclude /lost+found
exclude /media
exclude /mnt
exclude /proc
exclude /run
exclude /sys
exclude /tmp

# The include_file and exclude_file parameters, if enabled, simply get
# passed directly to rsync. Please look up the --include-from and
# --exclude-from options in the rsync man page for more details.
#
#include_file /path/to/include/file
#exclude_file /path/to/exclude/file

# Default rsync args. All rsync commands have at least these options set.
#
rsync_long_args -v --numeric-ids --relative -ev --rsync-path=/home/backupuser/rsync-wrapper.sh

# ssh has no args passed by default, but you can specify some here.
#
ssh_args -i

# Default arguments for the "du" program (for disk space reporting).
# The GNU version of "du" is preferred. See the man page for more details.
#
du_args -csh

# If this is enabled, rsync won't span filesystem partitions within a
# backup point. This essentially passes the -x option to rsync.
# The default is 0 (off).
#
#one_fs 0

# If enabled, rsnapshot will write a lockfile to prevent two instances
# from running simultaneously (and messing up the snapshot_root).
# If you enable this, make sure the lockfile directory is not world
# writable. Otherwise anyone can prevent the program from running.
#
lockfile /path-to-pid-file

###############################
### BACKUP POINTS / SCRIPTS ###
###############################

# LOCALHOST
backup @:/ root/

SERVER
We have already seen the backup script that I use to keep an overview on all my systems and choosing a specific system to backup. For this to run of course we need rsnapshot, rsync and dialog.

aptitude install rsync rsnapshot dialog

The a dedicated user is required which I will call backupserver:

useradd backupserver -c "Backup user on server" -m

For this user we need to create a certificate for ssh:

ssh-keygen

Follow the instructions and do not give a password because we would need to showup this password all the times taking a backup.
Now we are able to spread the public part of the certificate to all client machines and we are done.
Finally I installed screen and used that as the default shell for the backupserver user in order to close – detache – the current connection to my headless backup server while the backup is running in back.

aptitude install screen

I also add the following to /etc/screenrc which gives me a normal bash in debian when using screen:

shell /bin/bash

Just for my own insufficiency:

Use the correct ownership and rights for the backup folder in order to allow the backupserver user to write the backup in the folder and always use this user when performing backups. Do not use the root user for that because it will destroy the rights structure in the backup folder and does not function with the certificate created earlier.

Xorg.conf for three screen setup

After some extended reseach I finally succeeded in getting my server setup with three screens running. My system currently consists of a radeon HD6570 with two Lenovo L171 and a Geforce 6200 with a Lenovo L171p. The monitors are capable of showing a resolution of 1280×1024 on a 17″ screen and the complete setup is using Xinerama.


Section "ServerLayout"
Identifier "three screen setup"
Screen 0 "screen-links" 0 0
Screen 1 "screen-mitte" RightOf "screen-links"
Screen 2 "screen-rechts" RightOf "screen-mitte"
InputDevice "Mouse0" "CorePointer"
InputDevice "Keyboard0" "CoreKeyboard"
Option "Xinerama" "1"
EndSection

Section "Files"
ModulePath "/usr/lib/xorg/modules"
FontPath "/usr/share/fonts/X11/misc"
FontPath "/usr/share/fonts/X11/cyrillic"
FontPath "/usr/share/fonts/X11/100dpi/:unscaled"
FontPath "/usr/share/fonts/X11/75dpi/:unscaled"
FontPath "/usr/share/fonts/X11/Type1"
FontPath "/usr/share/fonts/X11/100dpi"
FontPath "/usr/share/fonts/X11/75dpi"
FontPath "built-ins"
EndSection

Section "Module"
Load "glx"
EndSection

Section "InputDevice"
Identifier "Keyboard0"
Driver "kbd"
EndSection

Section "InputDevice"
Identifier "Mouse0"
Driver "mouse"
Option "Protocol" "auto"
Option "Device" "/dev/input/mice"
Option "ZAxisMapping" "4 5 6 7"
EndSection

Section "Monitor"
Identifier "monitor-links"
VendorName "lenovo"
ModelName "L171 links"
EndSection

Section "Screen"
Identifier "screen-links"
Device "radeon-links"
Monitor "monitor-links"
DefaultDepth 24
SubSection "Display"
Depth 24
Modes "1280x1024"
# ViewPort 0 0
# Virtual 1280 1024
EndSubsection
EndSection
Section "Monitor"
Identifier "monitor-mitte"
VendorName "IBM"
ModelName "L171p mitte"
EndSection

Section "Screen"
Identifier "screen-mitte"
Device "radeon-mitte"
Monitor "monitor-mitte"
DefaultDepth 24
SubSection "Display"
Depth 24
Modes "1280x1024"
# ViewPort 0 0
# Virtual 1280 1024
EndSubsection
EndSection

Section "Monitor"
Identifier "monitor-rechts"
VendorName "lenovo"
ModelName "L171 rechts"
EndSection

Section "Screen"
Identifier "screen-rechts"
Device "nouveau"
Monitor "monitor-rechts"
DefaultDepth 24
SubSection "Display"
Depth 24
Modes "1280x1024"
# ViewPort 0 0
# Virtual 1280 1024
EndSubsection
EndSection
Section "Device"
### Available Driver options are:-
### Values: : integer, : float, : "True"/"False",
### : "String", : " Hz/kHz/MHz",
### : "%"
### [arg]: arg optional
#Option "SWcursor" # []
#Option "HWcursor" # []
#Option "NoAccel" # []
#Option "ShadowFB" # []
#Option "VideoKey" #
#Option "WrappedFB" # []
#Option "GLXVBlank" # []
#Option "ZaphodHeads" #
#Option "PageFlip" # []
#Option "SwapLimit" #
#Option "AsyncUTSDFS" # []
#Option "AccelMethod" #
Identifier "nouveau"
Driver "nouveau"
BusID "PCI:24:7:0"
Screen 0
Option "DynamicTwinView" "FALSE"
EndSection

Section "Device"
### Available Driver options are:-
### Values: : integer, : float, : "True"/"False",
### : "String", : " Hz/kHz/MHz",
### : "%"
### [arg]: arg optional
#Option "Accel" # []
#Option "SWcursor" # []
#Option "EnablePageFlip" # []
#Option "ColorTiling" # []
#Option "ColorTiling2D" # []
#Option "RenderAccel" # []
#Option "SubPixelOrder" # []
#Option "AccelMethod" #
#Option "EXAVSync" # []
#Option "EXAPixmaps" # []
#Option "ZaphodHeads" #
#Option "EnablePageFlip" # []
#Option "SwapbuffersWait" # []
Identifier "radeon-links"
Driver "radeon"
BusID "PCI:1:0:0"
Screen 0
Option "ZaphodHeads" "DVI-1"
EndSection

Section "Device"
### Available Driver options are:-
### Values: : integer, : float, : "True"/"False",
### : "String", : " Hz/kHz/MHz",
### : "%"
### [arg]: arg optional
#Option "Accel" # []
#Option "SWcursor" # []
#Option "EnablePageFlip" # []
#Option "ColorTiling" # []
#Option "ColorTiling2D" # []
#Option "RenderAccel" # []
#Option "SubPixelOrder" # []
#Option "AccelMethod" #
#Option "EXAVSync" # []
#Option "EXAPixmaps" # []
#Option "ZaphodHeads" #
#Option "EnablePageFlip" # []
#Option "SwapbuffersWait" # []
Identifier "radeon-mitte"
Driver "radeon"
BusID "PCI:1:0:0"
Screen 1
Option "ZaphodHeads" "DVI-0"
EndSection

msmtp – config

To add the capability of sending email from any host I decided to install msmtp-mta on any host that does not include a full email server. The config is done in the file: /etc/msmtprc

and here is the contents:
account default
host cubietruck.steppenwolf.de
auto_from on
maildomain steppenwolf.de
tls on
tls_trust_file /etc/CA/cert.pem

rsync setup for client backup

I am using rsync for keeping my photos on the mobile phone in sync with my server. The complete system is growing and growing and the actual state is able to sync photos, books, music and movies. And I also wrote a very limited simple interface to show the contents on the web after syncing. Photos are stored in the server in a date-oriented form. The php script to handle the files on the server side may be found here.
The app is just installed from google play (https://play.google.com/store/apps/details?id=eu.kowalczuk.rsync4android&hl=en). Follow the instructions to generate a key and copy the public key to the server ~/.ssh/authorized_keys – possibly you need to add the key to already stored keys in your authorized_keys file

The setup in android should state the following options:

rsync -vHrltD --exclude=thumbnails --chmod=ug+rwx,o-rwx --perms -e "ssh -y -p 22 -i "mobile@:BackupFromMobile/pictures

for the other folders like:

  • pictures
  • ebooks
  • music
  • whatsapp
  • movies
  • the same settings apply.

    OpenVPN Multihost Config

    Here is my working access to my intranet via OpenVPN in MultiHost Mode. First the server config:

    and now the client config:

    ##############################################
    # Sample client-side OpenVPN 2.0 config file #
    ##############################################
    # Specify that we are a client and that we
    # will be pulling certain config file directives
    # from the server.
    client
    # Use the same setting as you are using on
    # the server.
    dev tun
    # Are we connecting to a TCP or
    # UDP server? Use the same setting as
    # on the server.
    proto udp
    # The hostname/IP and port of the server.
    remote olkn.myvnc.com 1194
    # Keep trying indefinitely to resolve the
    # host name of the OpenVPN server. Very useful
    # on machines which are not permanently connected
    # to the internet such as laptops.
    resolv-retry infinite
    # Most clients don't need to bind to
    # a specific local port number.
    nobind
    # Downgrade privileges after initialization (non-Windows only)
    user nobody
    group nogroup
    # Try to preserve some state across restarts.
    persist-key
    persist-tun
    # If you are connecting through an
    # HTTP proxy to reach the actual OpenVPN
    # server, put the proxy server/IP and
    # port number here. See the man page
    # if your proxy server requires
    # authentication.
    ;http-proxy-retry # retry on connection failures
    ;http-proxy [proxy server] [proxy port #]
    # SSL/TLS parms.
    ca /etc/openvpn/ca.crt
    cert /etc/openvpn/thinkpad.crt
    key /etc/openvpn/thinkpad.key
    # Verify server certificate by checking
    # that the certicate has the nsCertType
    # field set to "server". This is an
    # important precaution to protect against
    # a potential attack discussed here:
    # http://openvpn.net/howto.html#mitm
    ns-cert-type server
    # If a tls-auth key is used on the server
    # then every client must also have the key.
    ;tls-auth ta.key 1
    # Enable compression on the VPN link.
    comp-lzo
    # Set log file verbosity.
    verb 3