web scraper

contents

  • logging
  • data base access
  • solr indexing
  • filesystem access
  • web scraping

logging

Data base access

– mysql in python


import mysql.connector
# from mysql.connector import Error

# pip3 install mysql-connector
# https://dev.mysql.com/doc/connector-python/en/connector-python-reference.html

class DB():
    def __init__(self, config):
        self.connection = None
        self.connection = mysql.connector.connect(**config)
        
    def query(self, sql, args):
        cursor = self.connection.cursor()
        cursor.execute(sql, args)
        return cursor

    def insert(self,sql,args):
        cursor = self.query(sql, args)
        id = cursor.lastrowid
        self.connection.commit()
        cursor.close()
        return id

    # https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursor-executemany.html
    def insertmany(self,sql,args):
        cursor = self.connection.cursor()
        cursor.executemany(sql, args)
        rowcount = cursor.rowcount
        self.connection.commit()
        cursor.close()
        return rowcount

    def update(self,sql,args):
        cursor = self.query(sql, args)
        rowcount = cursor.rowcount
        self.connection.commit()
        cursor.close()
        return rowcount

    def fetch(self, sql, args):
        rows = []
        cursor = self.query(sql, args)
        if cursor.with_rows:
            rows = cursor.fetchall()
        cursor.close()
        return rows

    def fetchone(self, sql, args):
        row = None
        cursor = self.query(sql, args)
        if cursor.with_rows:
            row = cursor.fetchone()
        cursor.close()
        return row

    def __del__(self):
        if self.connection != None:
            self.connection.close()

  # write your function here for CRUD operations

solr indexing

filesystem access

web scraping

solr – managed-schema field definitions

name type description active flags deactive flags
ignored_* string catchall for all undefined metadata multiValued
id string unique id field stored, required multiValued
_version_ plong internal solr field indexed, stored
text text_general content field for facetting multiValued docValues, stored
content text_general main content field as extracted by tika stored, multiValued, indexed docValues
author string author retrieved from tika multiValued, indexed, docValues stored
*author string dynamic field for authors retrieved from tika multiValued, indexed, docValues stored
title string title retrieved from tika multiValued, indexed, docValues stored
*title string dynamic title field retrieved from tika multiValued, indexed, docValues stored
date string date retrieved from tika multiValued, indexed, docValues stored
content_type plongs content_type retrieved from tika multiValued, indexed, docValues stored
stream_size string stream_size retrieved from tika multiValued, indexed, docValues stored
cat string category defined by user through manifoldcf multiValued, docValues stored

Additional copyField statements to insert data in fields:

  • source=”content” dest=”text”
  • source=”*author” dest=”author”
  • source=”*title” dest=”title”

solr search server with tika and manifoldcf

I finally managed to get my search server running using solr as main engine and tika for extraction. The setup is competed by a manifoldcf for access to files, emails, wiki, rss and web.

solr

A short overview on the basic file structure of solr is shown below:

filestructure


<solr-home-directory/
solr.xml
core_name1/
core.properties
conf/
solrconfig.xml
managed-schema
data/

And here is my core.properties file without cloud on a single server and very basic as well.

core.properties


Name=collection name
Config=solrconfig.xml
dataDir=collection name/data

schema fields from tika

The following fields are essential for my setup:

  • id – the identifier unique for solr
  • _version_ – also some internal stuff for solr
  • content – the text representation of the extraction results from tika
  • ignored_* – as a catchall for any metadata that is not covered by a field in the index

The solr install is following the instructions given by the project team. As I am using debian the solr.in.sh is barely standard. Here are the settings:


SOLR_PID_DIR="/var/solr"
SOLR_HOME="/var/solr/data"
LOG4J_PROPS="/var/solr/log4j2.xml"
SOLR_LOGS_DIR="/var/solr/logs"
SOLR_PORT="8983"

Solr is started via old init.d style script from the project team. No modifications here.

The specific managed-schema and solrconfig.xml files are not listed here but took the most time to get them running. Some comments:

  • grab some information on the metadata extracted by tika to find the fields that should be worth a second look
  • check for the configuration given in /var/solr/data/conf/
  • especially the solr log at /var/solr/logs/solr.log
  • managed-schema shoud be adjusted for the metadata retrived through tika
  • delete any old collection files by removing /var/solr/data/collection name/collection name/index/
  • solr cell is responsible for importing/indexing files in foreign formats like PDF, Word, etc
  • set stored false as often as possible
  • set indexed false as much as possible
  • remove copyfields as far as possible
  • set indexed false for text_general
  • use catchall field for indexing
  • start JVM in server mode
  • set logging on higher level only
  • integrate everything in tomcat
  • set indexed or docValues to true but not both
  • some field type annottations: Solr Manual 8.11

some interesting commands

  • /bin/solr start
  • /bin/solr stop -all
  • /bin/post -c collection input
  • /bin/solr delete -c collection
  • /bin/solr create -c collection -d configdir
  • velocity setup

    velocity may be used as a search interface for solr but my setup is not completed yet.

    tika

    The tika server version is also installed as described by the project team. I only added a start script for systemd as follows:


    [Unit]
    Description=Apache Tika Server
    After=network.target

    [Service]
    Type=simple
    User=tika
    Environment="TIKA_INCLUDE=/etc/default/tika.in.sh"
    ExecStart=/usr/bin/java -jar /opt/tika/tika-server-standard-2.3.0.jar --port 9998 --config /opt/tika/tika-config.xml
    Restart=always

    [Install]
    WantedBy=multi-user.target

    The tika.in.sh is once again copied from project team suggestion without modifications:


    TIKA_PID_DIR="/var/tika"
    LOG4J_PROPS="/var/tika/log4j.properties"
    TIKA_LOGS_DIR="/var/tika/logs"
    TIKA_PORT="9998"
    TIKA_FORKED_OPTS=""

    The tika-config.xml is quit empty at the moment but I hope to get logging running soon.

    ManifoldCF

    And finally the manifoldcf installation from scratch as the interface to the various information resources.

    and here is my systemd start script:

    [Unit]
    Description=ManifoldCF service
    [Service]
    WorkingDirectory=/opt/manifoldcf/example
    ExecStart=/usr/bin/java -Xms512m -Xmx512m -Dorg.apache.manifoldcf.configfile=./properties.xml -Dorg.apache.manifoldcf.jettyshutdowntoken=secret_token -Djava.security.auth.login.config= -cp .:../lib/mcf-core.jar:../lib/mcf-agents.jar:../lib/mcf-pull-agent.jar:../lib/mcf-ui-core.jar:../lib/mcf-jetty-runner.jar:../lib/jetty-client-9.4.25.v20191220.jar:../lib/jetty-continuation-9.4.25.v20191220.jar:../lib/jetty-http-9.4.25.v20191220.jar:../lib/jetty-io-9.4.25.v20191220.jar:../lib/jetty-jndi-9.4.25.v20191220.jar:../lib/jetty-jsp-9.2.30.v20200428.jar:../lib/jetty-jsp-jdt-2.3.3.jar:../lib/jetty-plus-9.4.25.v20191220.jar:../lib/jetty-schemas-3.1.M0.jar:../lib/jetty-security-9.4.25.v20191220.jar:../lib/jetty-server-9.4.25.v20191220.jar:../lib/jetty-servlet-9.4.25.v20191220.jar:../lib/jetty-util-9.4.25.v20191220.jar:../lib/jetty-webapp-9.4.25.v20191220.jar:../lib/jetty-xml-9.4.25.v20191220.jar:../lib/commons-codec-1.10.jar:../lib/commons-collections-3.2.2.jar:../lib/commons-collections4-4.2.jar:../lib/commons-discovery-0.5.jar:../lib/commons-el-1.0.jar:../lib/commons-exec-1.3.jar:../lib/commons-fileupload-1.3.3.jar:../lib/commons-io-2.5.jar:../lib/commons-lang-2.6.jar:../lib/commons-lang3-3.9.jar:../lib/commons-logging-1.2.jar:../lib/ecj-4.3.1.jar:../lib/gson-2.8.0.jar:../lib/guava-25.1-jre.jar:../lib/httpclient-4.5.8.jar:../lib/httpcore-4.4.10.jar:../lib/jasper-6.0.35.jar:../lib/jasper-el-6.0.35.jar:../lib/javax.servlet-api-3.1.0.jar:../lib/jna-5.3.1.jar:../lib/jna-platform-5.3.1.jar:../lib/json-simple-1.1.1.jar:../lib/jsp-api-2.1-glassfish-2.1.v20091210.jar:../lib/juli-6.0.35.jar:../lib/log4j-1.2-api-2.4.1.jar:../lib/log4j-api-2.4.1.jar:../lib/log4j-core-2.4.1.jar:../lib/mail-1.4.5.jar:../lib/serializer-2.7.1.jar:../lib/slf4j-api-1.7.25.jar:../lib/slf4j-simple-1.7.25.jar:../lib/velocity-1.7.jar:../lib/xalan-2.7.1.jar:../lib/xercesImpl-2.10.0.jar:../lib/xml-apis-1.4.01.jar:../lib/zookeeper-3.4.10.jar:../lib/javax.activation-1.2.0.jar:../lib/javax.activation-api-1.2.0.jar: -jar start.jar
    User=solr
    Type=simple
    SuccessExitStatus=143
    TimeoutStopSec=10
    Restart=on-failure
    RestartSec=10
    [Install]
    WantedBy=multi-user.target

snmp v3

to get the net-snmp-config tool the libsnmp-dev package must be installed:

# apt-get install libsnmp-dev
# net-snmp-config –create-snmpv3-user -ro -A ‘geheim’ -X ‘secret’ -a SHA -x AES icinga

you may also create a new user using snmp commands:

#snmpusm -v 3 -u -l authNoPriv -a MD5 -A localhost passwd

to simplify the usage a local user profile should be created in ~/.snmp/snmp.conf:

defSecurityName
defContext “”
defAuthType MD5
defSecurityLevel authNoPriv
defAuthPassphrase defVersion 3

now a simple command looks like:

#snmpget localhost sysUpTime.0

second network card with same driver

I do own three network cards with a RTL 8139 chipset and finally managed to get them work with my installation by simply adding a new file:

/etc/modprobe.d/8139too.config

alias eth1 8139too
alias eth2 8139too

The interface eth0 is reserved for the internal network card of the board.

WebFrontEnd for rsnapshot based backup system

As already discussed in: rsnapshot backup solution I am using rsnapshot as my main backup solutions for a variety of servers.
But I am lazy and often forget about the correct settings, commands, etc. For this reason I started building up some web front ends for several common daily tasks. Just to remember what to do and how and of course to have some fun with programming.
And here is another part of this series: The Rsnapshot Web Front End:

The complete webpage is enclosed into one file – this was my first internal order, because I wanted something that may be installed very easily. There is also a file structure as follows:
main folder:

  • default – the base for new config files
  • pid – the process id file
  • logs – the folder with log files
  • rsnapshot – the folder holding the config files
  • backup.php – the web front end

The default file is cloned to get the base for an new config file and as such is the master. All files in folder rsnapshot are shown in a matrix to access the action.

The web page itself is self explanatory and allows to:

  • view the config and logfiles,
  • create and delete a config file/server entry
  •  start a backup process and
  • edit the config file

Any new entry made in the config file is checked against a regular expression already within the backup.php file to avoid misconfigurations. The editing is further devided into normal and expert mode whereas normal mode is only showing the most important options and expert mode is showing all options.

The styling/design of the page is very poor and might be improved later on. Also the security of the page is lacking some attention and might change.

But now to the code:

<?php ############# HEAD  ############################
echo "<!DOCTYPE html>\n";
echo "<html>\n<head>\n <meta charset='utf-8'><meta name='viewport' content='width=device-width, initial-scale=1.0'>\n";
############# style sheets ########################
echo "<title>Backup Admin Page</title>\n</head>\n<body>\n\n";
echo "<header><h1>Backup Admin - Main Page</h1>";
echo "<h4>created to configure my own backup system based on rsnapshot</h4></header>\n";
echo "<style>.red { color: red;} .blue {color: blue;} .hide {display: none;} .comment { width: 100%;}</style>";
############# Configuration ##########################
$config['configfile'] = "rsnapshot/";# folder for config files
$config['logfile'] = "log/";# folder for logfiles
$config['command'] = "ls -l";# command to be executed to start backup
##################################### no editing below here ###########################
$PHP_SELF = htmlspecialchars($_SERVER['PHP_SELF']); # use one file only which will be reload with new parameters all the time
$general_menu = array( "Cancel", "NewHost");# general menue options not associated with host/server
$menue_items = array("ShowConfigfile", "StartBackup", "ShowLogfile", "EditConfigfile", "Delete");# menue items for each host/server
$config['columns'] = count($menue_items);# the number of columns in total
$config['values'] = array("snapshot_root", "interval", "backup", "no_create_root", "cmd_cp", "cmd_rm",
		"cmd_rsync", "cmd_ssh", "cmd_logger", "cmd_du", "verbose", "loglevel", "rsync_long_args", "ssh_args", 
		"du_args", "logfile", "lockfile", "config_version");
$config["inputtype"]["interval"] = array("weekly","hourly","daily","monthly");
$config["inputtype"]["verbose"] = array("1","2","3","4","5");
$config["inputtype"]["loglevel"] = array("1","2","3","4","5");
##################################### function definitions ########################################
####### clear html variables from malicious code
function clean_html($variable){
	return trim(htmlspecialchars($variable));
}
###### debug output of POST
function debug_output(){
	echo "<pre><br><h1>Hier kommt der schamass!</h1>";
	print_r($_POST);
	echo "</pre>"; return ;
}
###### parse contents of configfile and store everything in two arrays
function parse_file_contents($filename){
	global $inputvalues; # one matrix for all values already filled with defaults
	if ($data = file_get_contents($filename)){
		$contents = explode("\n", $data);$comment = "";
		foreach ($contents as $line) { # parse each line of the input file seperately
			$line = trim($line); $value = ""; # and override default values with config file values
			if (strlen($line) && substr($line, 0, 1) == '#') # a comment line
				$comment .= "\n".$line; 
			elseif (strlen($line)){ # non empty line found
				$key = trim(strtok($line, " \t")); # name of the specific option is first parameter
				while ($inputpart = strtok(" \t")) $value = $value.trim($inputpart)."\t"; # rest of the line is combined as value
				if ($key == "exclude") # special treatement for excluded directories
					$inputvalues["exclude"][] = trim($value);
				else # normal value will override default values in array
					$inputvalues["values"][$key] = trim($value);
				if ($comment != "") {# non enpty comment will be stored
					$inputvalues['comment'][$key] = $comment; 
					$comment = "";
				}
			}
		}
		$return = "File: ".$filename." successfully paresed!";
	}
	else # could not get file contents via function so die
		$return = "Could not open file: ".$filename;
	return $return;
}
####### save config file
function save_conf_file($inputvalues, $filename){
	global $config;
	$output = "";
	foreach ($config['values'] as $key){# iterate over all config file parameters
		if (isset($inputvalues['comment'][$key])) $output .= $inputvalues['comment'][$key]."\n";
		if (isset($inputvalues['values'][$key]))  $output .= $key."\t".$inputvalues['values'][$key]."\n";
	}
	if (isset($inputvalues['comment']['exclude'])) $output .= $inputvalues['comment']['exclude']."\n";
	foreach ($inputvalues['exclude'] as $key => $value)
		if ($inputvalues['exclude'][$key] != "") $output .= "exclude"."\t".$value."\n";
	
	/*
	if ($handle = fopen($filename, "w")){ # open configfile for write
		fwrite($handle, $output);
		fclose($handle);
	}
	
	else 
		$return = "Could not open file: ".$filename;
		*/
	return "<pre>".$output."</pre>";
#	return $return;
}
####### edit values using defaults or values from config file
function edit_conf_file($inputvalues, $config, $mode){
	$class = "show";
	$return = "<table border='1' width='100%'>\n<tr><td colspan='4' align='right'>";
	$return .= "<input type='submit' name='action[Expert]' value='".$mode."'></td></tr>\n";
	foreach ($config['values'] as $key){
		if ($key == "no_create_root" && $mode == "Normal") $class = "hide";# show other entries only in expert mnode
		if (isset($inputvalues['comment'][$key])){# any comments for the forthcoming option?
			$rows = substr_count($inputvalues['comment'][$key], "\n") + 2;
			$return .= "<tr class='".$class."'><td colspan='3'><textarea cols='100%' rows='".$rows;
			$return .= "' name='inputvalues[comment][".$key."]'> ".$inputvalues['comment'][$key];
			$return .= " </textarea></td></tr>\n";
		}# and now the entry
		$return .= "<tr class='".$class."'><td>".$key."</td>";
		if (isset($config['inputtype'][$key])) {# no simple input field but a select box
			$return .= "<td><select name='inputvalues[values][".$key."]' size='1'>";
			foreach ($config['inputtype'][$key] as $value){
				$return .= "<option";
				if(isset($inputvalues['values'][$key]) && strtok($inputvalues['values'][$key], " ") == $value ) $return .= " selected ";
				$return .= ">".$value."</option>\n";
			}
			$return .= "</select></td>";
		} else {# simple input field
			$return .= "<td><input type='input' name='inputvalues[values][".$key."]' value='";
			if (isset($inputvalues['values'][$key])) $return .= $inputvalues['values'][$key]; 
			$return .= "'></td>";
		}
		$return .= "<td></td></tr>\n";
	}################# Exclude directories ##################
	$return .= "<tr class='show'><td colspan='4'>Exclude Directories:</td></tr>\n";
	if (isset($inputvalues['comment']['exclude'])){
		$rows = substr_count($inputvalues['comment']['exclude'], "\n") + 2;
		$return .= "<tr class='show'><td colspan='3'><textarea cols='100%' rows='".$rows;
		$return .= "' name='inputvalues[comment][exclude]'> ".$inputvalues['comment']['exclude'];
		$return .= " </textarea></td></tr>\n";
	}
	if (isset($inputvalues['exclude'])){
		foreach ($inputvalues['exclude'] as $subkey => $value){# show excludes as checkbox list
			if ($value != "") {# only if there is an entry and not en empty line
				$return .= "<tr class='show'><td colspan='2'>".$value."</td><td><input type='checkbox' name='inputvalues[exclude]";
				$return .= "[".$subkey."]' value='".$value."' checked></td></tr>\n";
			}
		}
	} 
	for ($i=0; $i < 5; $i++)# add 5 extra rows for exclude input
		$return .= "<tr class='show'><td>Add exclude folder</td><td colspan='2'><input type='input' value='' name='inputvalues[exclude][]'></td></tr>\n";
	$return .= "<tr class='show'><td align='center' colspan='2'><input type='submit' value='SaveConfigfile' ";
	$return .= "name='action[SaveConfigfile]'></td><td align='center' colspan='2'>";
	$return .= "<input type='submit' name='action[Cancel]' value='Cancel'></td></tr>\n</table>\n";
	return $return;
}
####### function generate menu
function generate_menue($menue_items, $configfile){
	$return = "<tr><thead align='center'>";
	foreach ($menue_items as $key) # table headers
			$return .= "<th>".$key."</th>";
	$return .= "</thead>\n<tbody align='center'>\n";
	foreach (scandir($configfile) as $file){# iterate over complete directory contents
		if (substr($file, 0, 1) != ".") { # dotfiles are ignored
			$file = htmlspecialchars($file);$return .= "<tr>";
			foreach ($menue_items as $key) {
				$return .= "<td><input type='submit' value='  ".$file."  ' name='action[".$key."]'></td>";
			}
			$return .= "</tr>\n";
		}
	}
	return $return;}
######################################################################
############################## MAIN ##################################
######################################################################
echo "<body><form method='POST' action='".$PHP_SELF."'>\n";# page header with headline and table headers
echo "<table border=1 cellspacing='10'>\n";# everything will be bound to one table
echo generate_menue($menue_items, $config['configfile']);
$cols = round($config['columns'] / 2);$cols_remain = $config['columns'] - $cols;
echo "<tr><td colspan='".$cols."'><input type='submit' name='action[Cancel]' value='Cancel'></td>";
echo "<td colspan='".$cols_remain."'><input type='submit' name='action[NewHost]' value='NewHost'></td></tr>";
######################### Main Menue #####################################
if (isset($_POST['expertMode']) && $_POST['expertMode'] == "Expert") $mode = "Expert";
else $mode = "Normal";
if (isset($_POST['action'])) {
	$menue_entry = array_keys($_POST['action'])[0];# action[<menue_entry>] -> <host>/<menue_entry>
	$host = clean_html($_POST['action'][$menue_entry]);
	switch ($menue_entry){
		case "EditConfigfile":###########################################
			if (file_exists($config['configfile'].$host)) {
				$return = "<h3>Edit (".$host.")</h3>";# show headline
				$return .= parse_file_contents($config['configfile'].$host);
				$return .= "<input type='hidden' name='hostname' value='".$host."'>";			
				$return .= "<pre class='blue'>".edit_conf_file($inputvalues, $config, $mode)."</pre>";
			} else {
				$return = "<h3>Error (".$host.")</h3>";# show headline
				$return .= "<pre class='red'>File: ".$config['configfile'].$host." does not exist!</pre>\n";# show error message
			}
			break;
		case "Expert":###################################################
			if (isset($_POST['hostname'])){
				$return = "<h3>Edit (".$_POST['hostname'].")</h3>";# show headline
				if ($host == "Expert") $mode = "Normal";
				else $mode = "Expert";
				$return .= "<input type='hidden' name='hostname' value='".$_POST['hostname']."'>";			
				$return .= "<pre class='blue'>".edit_conf_file($_POST['inputvalues'], $config, $mode)."</pre>";
			} else {
				$return = "<h3>Error (".$host.")</h3>";# show headline
				$return .= "<pre class='red'>Error! No host defined!</pre>\n";# show error message
			}
			break;
		case "SaveConfigfile":############################################
			if (isset($_POST['inputvalues'])){
				$return = "<h3>Save Configfile (".$host.")</h3>";# show headline
				$return .= "<pre class='blue'>".save_conf_file($_POST['inputvalues'], $config['configfile'].$host)."</pre>";# write file contents
			} else {
				$return = "<h3>Error (".$host.")</h3>";# show headline
				$return .= "<pre class='red'>Error! No host defined!</pre>\n";# show error message
			}
			break;
		case "ShowConfigfile":############################################
			if (file_exists($config['configfile'].$host)){# check if file exists in folder
				$return = "<h3>Show Configfile (".$host.")</h3>";# show headline
				$return .= "<pre class='blue'>".file_get_contents($config['configfile'].$host)."</pre>";# get file contents
			} else { 
				$return = "<h3>Error (".$host.")</h3>";# show headline
				$return .= "<pre class='red'>File: ".$config['configfile'].$host." does not exist!</pre>";# show error
			}
			break;
		case "StartBackup":###############################################
			$befehl = $config['command']." ".$config['configfile'].$host." 2>&1";# create backup command
			if (file_exists($config['configfile'].$host)){# check if file exists
				$return = "<h3>Start Backup (".$host.")</h3>";
				$return .= "<pre class='blue'>Execute Command: (".$befehl.")</pre>";
				$return .= exec($befehl, $output, $return_val);# execute backup command
#				$return .= $output." ".$return_val;
			} else {
				$return = "<h3>Error (".$host.")</h3>";# show headline
				$return .= "<pre class='red'>File: ".$config['configfile'].$host." does not exist!</pre>";# show error message
			}
			break;
		case "ShowLogfile":##############################################
			if (file_exists($config['logfile'].$host)){ # check if file exists
				$return = "<h3>Show Logfile (".$host.")</h3>";# show headline
				$return .= "<pre class='blue'>".file_get_contents($config['logfile'].$host)."</pre>";# get file contents
			} else { 
				$return = "<h3>Error (".$host.")</h3>";# show headline
				$return .= "<pre class='red'>File: ".$config['logfile'].$host." does not exist!</pre>";# show error message
			}
			break;
		case "Delete":###################################################
			$return = "<h3>Delete (".$host.")</h3>";# show headline
			$return .= "<div class='red'>You really want to delete:"; 
			$return .= "<input type='submit' name='action[ReallyDelete]' value='".$host."'></div>";
			break;
		case "ReallyDelete":#############################################
			$return = "<h3>Delete (".$host.")</h3>";# show headline
			if (file_exists($config['configfile'].$host)){
				if (unlink($config['configfile'].$host)){
						unlink($config['configfile'].$host);
						$return .= "<pre class='blue'>Host: ".$host." successfully deleted!</pre>";
				}
				else 
					$return .= "<pre class='red'>An error occured while deleting Host: ".$host."</pre>";
			}
			else 
				$return .= "<pre class='red'>File: ".$config['configfile'].$host." does not exist!</pre>\n";# show error message
			break;
		case "NewHost":##################################################
			$return = "<h3>Create (New Host)</h3>";# show headline
			$return .= "<div class='blue'>Please name new host entry:";
			$return .= "<input type='input' name='hostname' value='New Host'>";
			$return .= "<input type='submit' name='action[ReallyNewHost]' value='Create'></div>";
			break;
		case "ReallyNewHost":############################################
			$return = "<h3>Create (".$_POST['hostname'].")</h3>";# show headline
			if (file_exists($config['configfile'].$_POST['hostname'])) 
				$return .= "<pre class='red'>Host: <em>".$_POST['hostname']."</em> already exists!</pre>";
			else{
				if (copy('default', $config['configfile'].$_POST['hostname'])) {
					$return .= "<pre class='red'>New Host: <em>".$_POST['hostname']."</em> created!</pre>";
					$return .= parse_file_contents($config['configfile'].$_POST['hostname']);
					$return .= "<input type='hidden' name='hostname' value='".$_POST['hostname']."'>";
					$return .= "<pre class='blue'>".edit_conf_file($inputvalues, $config, $mode)."</pre>";
				}
				else 
					$return .= "<pre class='red'>Error while copying!</pre>";
			}
			break;
		default:#########################################################
			$return = "<h3>Error (".$menue_entry.")</h3>";# show headline
			$return .= "<pre class='red'>Please choose one of the options above!</pre>";
			break;
	}
}
elseif (isset($_POST['Comment'])){# comment button pressed
	$menue_entry = clean_html(array_keys($_POST['Comment'])[0]);# Comment[<config file option>][<line no>] -> <+/->
	$line = clean_html(array_keys($_POST['Comment'][$menue_entry])[0]);
		if (isset($_POST['hostname'])){
			$return = "<h3>Edit (".$_POST['hostname'].")</h3>";# show headline
			$return .= "<input type='hidden' name='hostname' value='".$_POST['hostname']."'>";
			$inputvalues = $_POST['inputvalues']; unset($inputvalues['comment'][$menue_entry][$line]);# delete specific line from variable			
			$return .= "<pre class='blue'>".edit_conf_file($inputvalues, $config, $mode)."</pre>";
		}
		else 
			$return = "<pre class='red'>Error! No host defined!</pre>\n";# show error message
}
else $return = "";
$return .= "<input type='hidden' name='expertMode' value='".$mode."'>";
################# Main Menue End ########################################
echo "<tr><td colspan='".$config['columns']."' align='left'>".$return."</td></tr>";
echo "</tr>\n</tbody></table>\n";
echo "<pre>";print_r($config);echo "</pre>";
echo "<pre> ";print_r($inputvalues);echo "</pre>";
debug_output();
############################# END #################################################################
echo "<br><br>";
echo "</form>\n<footer id='footer'>My personal page hosted on my own server &copy; olkn</footer></body></HTML>\n";
########### END ###################################?>

rsnapshot backup solution

In order to backup my systems I have choosen rsnapshot because it stores all files as normal files on the hard disk using links between different backup dates to reduce the file size of the complete backup. This enables me to check directly in the file system for old files. Normally I do not restore a complete backup but use to check for specific files only.
To simplify things further on I found a description to use rsync on the client machines and a server certificate, so that no interaction with the client is needed. The server may connect using his certificate and start a local rsync server on the client which handles the backup itself. And now to the steps required to setup the system on the client:

CLIENT
First we need rsync installed:

aptitude install rsync

then we need to create a special backup user:

useradd backupuser -c "Backup" -m -u 4210

Within the home folder of the new user we need to store the server certificate as .ssh/authorized_keys

mkdir .ssh
scp /home/backupuser/.ssh/authorized_keys

Now we need a special script which starts the rsync programm on the client listening for incoming requests:

scp /home/backupuser/

Finally some tweaking for the permissions:

chown backupuser:backupuser
chmod 755

We also need to allow the new user to execute rsync, so we adjust sudo for this by editing /etc/sudoers:

Defaults:backupuser !requiretty
Defaults:backupuser !visiblepw
backupuser ALL:NOPASSWD: /usr/bin/rsync

Thats it!
I also wrote a simple script to show a list of available rsnapshot config files stored on the backup server and start the backup for this script meaning backing up a specific system.
Here are the relevant scripts:
First the wrapper script which simply logs the current date and starts rsync using the command line options from rsnapshot remote config:


#!/bin/sh

/usr/bin/logger "BACKUP: Start backup at `/bin/date` ";
/usr/bin/sudo /usr/bin/rsync "$@";
/usr/bin/logger "BACKUP: Backup finalized at `/bin/date` ";

And here is the console script to start the backup:


#!/bin/sh
#
DIALOG="/usr/bin/dialog"
RSNAPSHOT="/usr/bin/rsnapshot"
CONFIGDIR=""
BACKUPDIR=""
CONFIGFILES=""

##### Check if Backupdisk is connected and correctly mounted ################
check_for_backup_dir() {
if [ ! -e $BACKUPDIR/ps2 ]
then
if [ $# -ne 0 ]
then
$DIALOG --clear --title "rsnapshot config" --msgbox "The backup disk is not connected to the machine!" 10 52
else
echo "The backup disk is not connected to the machine!"
fi
exit
fi
}

##### main ######
# choose dialog interface or command line parameter
if [ $# -ne 0 ]
then
if [ $# -gt 1 ]
then
echo "Usage: backup.sh or backup.sh "
echo " too many arguments on command line"
exit
fi
check_for_backup_dir
# command parameter available so no dialog interface used
if [ ! -e $CONFIGDIR/$1 ]
then
echo "No config file for system $1"
exit
else
COMMAND="$RSNAPSHOT -q -c $CONFIGDIR/$1 weekly"
$COMMAND
fi
else
check_for_backup_dir dialog
# no parameter available on command line so using dialog for choosing configuration
for file in $CONFIGDIR/*
do # create one variable including all available configs
CONFIGFILES="$CONFIGFILES $file ! "
done
TEMPFILE=`tempfile 2>/dev/null` || TEMPFILE=/tmp/test$$
trap "rm -f $TEMPFILE" 0 1 2 5 15
$DIALOG --clear --title "rsnapshot config" --menu "Choose rsnapshot configuration:" 18 50 10 $CONFIGFILES 2> $TEMPFILE
RUECKGABE=$?
case $RUECKGABE in
0)
AUSWAHL=`cat $TEMPFILE`
COMMAND="$RSNAPSHOT -q -c $AUSWAHL weekly" ### execute backup command
$COMMAND
;;
1)
echo "Backup canceled!"
;;
255)
echo "ESC pressed!"
;;
esac

trap "rm -f $TEMPFILE" 0 1 2 5 15

fi
# clean up and exit after all work is done
exit

The rsnapshot config files always look like the original ones delivered with rsnapshot except for the ssh section which starts the remote wrapper script shown above:


#################################################
# rsnapshot.conf - rsnapshot configuration file #
#################################################
# #
# PLEASE BE AWARE OF THE FOLLOWING RULES: #
# #
# This file requires tabs between elements #
# #
# Directories require a trailing slash: #
# right: /home/ #
# wrong: /home #
# #
#################################################
#
#######################
# CONFIG FILE VERSION #
#######################

config_version 1.2

###########################
# SNAPSHOT ROOT DIRECTORY #
###########################

# All snapshots will be stored under this root directory.
snapshot_root

# If no_create_root is enabled, rsnapshot will not automatically create the
# snapshot_root directory. This is particularly useful if you are backing
# up to removable media, such as a FireWire drive.
#
no_create_root 1

#################################
# EXTERNAL PROGRAM DEPENDENCIES #
#################################

# LINUX USERS: Be sure to uncomment "cmd_cp". This gives you extra features.
# EVERYONE ELSE: Leave "cmd_cp" commented out for compatibility.
#
# See the README file or the man page for more details.
#
cmd_cp /bin/cp

# uncomment this to use the rm program instead of the built-in perl routine.
cmd_rm /bin/rm

# rsync must be enabled for anything to work.
cmd_rsync /usr/bin/rsync

# Uncomment this to enable remote ssh backups over rsync.
cmd_ssh /usr/bin/ssh

# Comment this out to disable syslog support.
cmd_logger /usr/bin/logger

# Uncomment this to specify a path to "du" for disk usage checks.
cmd_du /usr/bin/du

#########################################
# BACKUP INTERVALS #
# Must be unique and in ascending order #
# i.e. hourly, daily, weekly, etc. #
#########################################

# The interval names (hourly, daily, ...) are just names and have no influence
# on the length of the interval. The numbers set the number of snapshots to
# keep for each interval (hourly.0, hourly.1, ...).
# The length of the interval is set by the time between two executions of
# rsnapshot , this is normally done via cron.
# Feel free to adapt the names, and the sample cron file under /etc/cron.d/rsnapshot
# to your needs. The only requirement is that the intervals must be listed
# in ascending order. To activate just uncomment the entries.

interval weekly 5

############################################
# GLOBAL OPTIONS #
# All are optional, with sensible defaults #
############################################

# If your version of rsync supports --link-dest, consider enable this.
# This is the best way to support special files (FIFOs, etc) cross-platform.
# The default is 0 (off).
# In Debian GNU cp is available which is superior to link_dest, so it should be
# commented out (disabled).
#
#link_dest 0

# Verbose level, 1 through 5.
# 1 Quiet Print fatal errors only
# 2 Default Print errors and warnings only
# 3 Verbose Show equivalent shell commands being executed
# 4 Extra Verbose Show extra verbose information
# 5 Debug mode More than you care to know
#
verbose 2

# Same as "verbose" above, but controls the amount of data sent to the
# logfile, if one is being used. The default is 3.
loglevel 3

# If you enable this, data will be written to the file you specify. The
# amount of data written is controlled by the "loglevel" parameter.
logfile

# The include and exclude parameters, if enabled, simply get passed directly
# to rsync. If you have multiple include/exclude patterns, put each one on a
# seperate line. Please look up the --include and --exclude options in the
# rsync man page for more details.
#

exclude /dev
exclude /lost+found
exclude /media
exclude /mnt
exclude /proc
exclude /run
exclude /sys
exclude /tmp

# The include_file and exclude_file parameters, if enabled, simply get
# passed directly to rsync. Please look up the --include-from and
# --exclude-from options in the rsync man page for more details.
#
#include_file /path/to/include/file
#exclude_file /path/to/exclude/file

# Default rsync args. All rsync commands have at least these options set.
#
rsync_long_args -v --numeric-ids --relative -ev --rsync-path=/home/backupuser/rsync-wrapper.sh

# ssh has no args passed by default, but you can specify some here.
#
ssh_args -i

# Default arguments for the "du" program (for disk space reporting).
# The GNU version of "du" is preferred. See the man page for more details.
#
du_args -csh

# If this is enabled, rsync won't span filesystem partitions within a
# backup point. This essentially passes the -x option to rsync.
# The default is 0 (off).
#
#one_fs 0

# If enabled, rsnapshot will write a lockfile to prevent two instances
# from running simultaneously (and messing up the snapshot_root).
# If you enable this, make sure the lockfile directory is not world
# writable. Otherwise anyone can prevent the program from running.
#
lockfile /path-to-pid-file

###############################
### BACKUP POINTS / SCRIPTS ###
###############################

# LOCALHOST
backup @:/ root/

SERVER
We have already seen the backup script that I use to keep an overview on all my systems and choosing a specific system to backup. For this to run of course we need rsnapshot, rsync and dialog.

aptitude install rsync rsnapshot dialog

The a dedicated user is required which I will call backupserver:

useradd backupserver -c "Backup user on server" -m

For this user we need to create a certificate for ssh:

ssh-keygen

Follow the instructions and do not give a password because we would need to showup this password all the times taking a backup.
Now we are able to spread the public part of the certificate to all client machines and we are done.
Finally I installed screen and used that as the default shell for the backupserver user in order to close – detache – the current connection to my headless backup server while the backup is running in back.

aptitude install screen

I also add the following to /etc/screenrc which gives me a normal bash in debian when using screen:

shell /bin/bash

Just for my own insufficiency:

Use the correct ownership and rights for the backup folder in order to allow the backupserver user to write the backup in the folder and always use this user when performing backups. Do not use the root user for that because it will destroy the rights structure in the backup folder and does not function with the certificate created earlier.

vnc connection via ssh tunnel

To use one of my servers as a development platform I have choosen VNC through a SSH tunnel to access a graphical user interface. The GUI is necessary in order to play around with eclipse and android development and VNC is used as it should reduce the network traffic to its bare minimum keeping my slow internet connection available for other services as well.

I will start with vnc4server as it seems to be available in the standard debian repos, so a:
aptitude install vnc4server
should do the trick. To get a display to connect to the next step is:
vnc4server -geometry 1024x768 -depth 24
which will create the display with the specified geometry. You will be prompted for a password afterwards which will be used for the remote connection. To get the new server running just type:
vnc4server
and thats it. The server will be stopped by:
vnc4server -kill :1
The corresponding viewer part will be installed by:
aptitude install xvnc4viewer
and you will be able to access the server by issuing:
xvnc4viewer
To have a graphical UI available I start with xfce on the remote machine in the first step to see if it suffices.
On the server side the desktop will be started by including the following command in /etc/vnc/xstartup:
startxfce4
whereas on the client side you should issue the following commend once you are connected to the server through a vnc client viewer:
xfdesktop

Cups Print Server with remote printer driver

To enable remote printing on a print server first setup the server with the correct printer drivers. In my case I had to install foo2zjs package in order to get my printer running. My network printer was than configured using the cups webinterface as socket://:9100 which is quiet specific for my HP printer.
The remote printing must be configured using the following entries in cupsd.conf:

Listen ip of printserver:631
BrowseOrder Deny,Allow
BrowseAllow From local net/255.255.255.0

Order deny,allow
Deny From All
Allow From localnet /255.255.255.0

on the client side please add the following directive to cupsd.conf:

BrowsePoll ip of printserver :631

In newer versions of cups the directive must be placed in cups-browsed.conf. The directive itself remains the same.

Do not forget to start both servers in order to get the new directives working. On the client machine the remote printers should be available in the webinterface as newly added printers ready for jobs.

In newer versions do not forget to restart the cups-browsed service also.