Installing Seafile, a free and open source Dropbox alternative

I've decided to movea away from Dropbox. I am trying to move away from closed-source software in general and have recently learned about Seafile, a free, open-source file hosting software system that is relaitvely easy to set up if you already have a server up and running.

I've found this extremely helpful guide online, but it needed a couple of changes on my system and is in German, so I've decided to share the steps I've performed to get it up and running on my raspberry pi here.

First, install all necessary packages

sudo apt-get install python2.7 python-setuptools python-simplejson python-imaging sqlite3

Next, add a user for seafile and log in as this user:

sudo adduser seafile --disabled-login
sudo su - seafile

As seafile user, download the latest version from github, extract and start the service:

wget https://github.com/haiwen/seafile-rpi/releases/download/v4.1.2/seafile-server_4.1.2_pi.tar.gz
tar xf seafile-server_4.1.2_pi.tar.gz
cd seafile-server-4.1.2/
./seafile.sh start
./seahub.sh start 8000

Sharing files between the Raspberry Pi and Windows 8

In order to share files between the raspberry pi and Windows we need to install samba:

sudo apt-get install samba samba-common-bin

We are going to use a dedicated user, samba, for all file sharing and will be using their home folder als share location. This means we first need a new unix user

sudo adduser samba

In addition, we also need to set up this user as a samba user

sudo sambapasswd -a samba

Now the tricky part. Basically all samba functionality is controlled via one file, smb.conf. In it, we can specify the security and authentication settings and which folders and services to share. After backing up the file

sudo cp /etc/samba/smb.conf /etc/samba/smb.conf.$(date +%Y-%m-%d)

we are going to create a new file with the following info:

sudo rm /etc/samba/smb.conf
sudo vi /etc/samba/smb.conf
[global]
    workgroup = WORKGROUP
    netbios name = pi
    server string = Pi Samba Share
    encrypt passwords = True
    security = user
    socket options = IPTOS_LOWDELAY TCP_NODELAY
    wins support = no

[PiShare]
    comment = Pi Samba Share
    path = /home/samba
    read only = No
    valid users = samba

All that is left to do is restarting the service.

sudo service samba restart

You should now be able to access the share folder by opening a winwows computer in the same network (and workgroup!), and connecting to //pi/PiShare with the user pi\samba and the password you selected above.

Configuring the USB drive

Although the raspberry pi has an 8GB micro SD card, we want to connect an external USB hard drive and use that as main drive for our files. This is both because 8GB are not a lot when we want to store media, but also because SD cards are not the most reliable storage media and we want to reduce the possibility of losing all our stuff.

You can see all drives connected to the server by entering

sudo fdisk -l

This will show the sdcard (/dev/mmblk0pX) with multiple partitions and the USB drive (/dev/sda1). We want to change the settings such that the USB drive is automatically mounted on /home. This will mean that all user data will be stored on the hard drive.

The fstab file contains the mount points for all drives. Back it up before making any changes:

 sudo cp /etc/fstab /etc/fstab.$(date +%Y-%m-%d) 

and then enter the following line:

/dev/sda1       /media/home     ext4    defaults        0       1

and mount the drive in that folder:

sudo mkdir /media/home
sudo mount -a

We're copying everything to the new drive:

sudo rsync -aXS /home /media
sudo diff -r /home /media/home 

The second command compares the two folders and makes sure the copy was successfull. We can now open /etc/fstab again and modify the mount point from /moedia/home to /home. To be on the safe side, it's best to back up the old home folder just in case:

 cd / && sudo mv /home /old_home && sudo mkdir /home 

After that, we only need to remount all drives:

sudo mount -a

and the hard drive shold be mounted in the home folder. If you are convinced that everything went well, just deleted the /old_home folder.
 

Setting up a static local IP

By default, any computer on the local network gets assigned a new (local) IP address everytime they connect to the system. We can check the local IP address by typing

ifconfig

In my case, the local address is 192.168.0.8. Since we will want to connect to the server from other computers on the network without changing the target IP address every time, we can assign a static IP address to the server.

This is done by changing the DHCP settings in your router. Usually, the router settings can be changed through a web interface, in my case this can be reached by typing 192.168.0.1 into a web browser. Set up DHCP reservation such that the server always gets assigned the same IP address. While we are at it, we'll also set up port forwarding to make the server available for SSH outside the network and forward por 22 (the SSH port) to 192.168.0.8.

The server basics: OS, user and ssh

First we are going to set up the basics: an OS (we're going to be using Raspbian), a new user and ssh. Rasbian by default comes with a user pi (password: raspberry) who has full permissions on the system. This is a problem when we connect the raspberry pi to the internet, as anyone could mess around with the system. We could just change the password, but might as well set up a new user and delete "pi" altogether. We are setting up SSH so that we can control the server remotely, both from within the network or any other computer connected to the internet.

After booting the pi for the first time from NOOBS, we select the Raspbian OS. This will set up the operating system and might take a while to run. After the installation is complete, we reboot the raspberry and find ourselves in the "Raspberry Pi Software Configuration Tool". If this does not come up automatically, start it by typing

sudo raspi-config

In the config tool, we are changing a couple of things:

  1. Change pi password: Even though we are later going to set up a new user, let's change the password from raspberry to something more unique.
  2. Boot to console: This will keep Raspbian from booting into the GUI and instead present us with a console terminal window after boot.
  3. Advanced options - Enable SSH: This will allow us to remotely connect to the raspberry pi.
  4. Advanced options - set hostname: We'll change this from "raspberrypi" to "pi" mostly so that the prompt in the terminal is a bit shorter
  5. Advanced options - memory split: Here we can select how much memory the GPU can have. Since we are going to run the raspberry without graphical interface, we'll set this to the lowest possible option, 16.

After updating all the settings, select finish and reboot the computer. Once prompted, login with the user "pi" and the password we just updated.

We're now going to set up a new user with the same rights as the current pi user. Type

groups

to see a list of groups the user pi belongs to. In my case, this is

pi adm dialout cdrom sudo audio video plugdev games users netdev gpio i2c spi input

In order to add a new user called phil with the same group memberships, type

sudo useradd -m -G adm dialout cdrom sudo audio video plugdev games users netdev gpio i2c spi input phil

and set the password for this user:

sudo passwd phil

After entering the password twice, reboot the computer again:

sudo reboot

Once the raspberry pi has finished it's boot process, login as the new user. The last thing to do is deleting the old pi user account:

sudo deluser --remove-all-files pi

After a while the computer should print "Done." indicating that the user pi and all their files have been removed.

Setting up a raspberry pi as home media server

I recently got my hands on a Raspberry Pi Model B+ V2 and am going to play around with it over the next couple of weeks.

The goal is to set up the following:

  • Set up the pi as a headless server in the home network
  • Mount an external USB hard drive
  • Install samba
  • Setup a VPN
  • Install a torrent daemon
  • Install Sonarr
  • Install Seafile (or other self-hosted storage solution for remote access).

I'll be adding a post for each of these and will be starting off with a clean SD card flashed with the most recent NOOBS version (currently 1.4.1 obtained from here).

Pattern discovery in data mining - Week 3

I've stopped following the lecture videos for the coursera data mining specialisation about halfway through this week. In the videos, Jiawei Han speeds through a dozen or so algorithms for finding patterns in sequence and graph databases. Almost no time is wasted on the merits of specific algorithms or how they would be best implemented. It is much more informative to just read the Wikipedia articles on each algorithm.

Nevertheless, I've just completed this week's quiz. Only one questions required a bit of coding (see my github repository), the rest of the questions only relate to two algorithsm and can be answered by just eyeballing the given example database or graphs.

I will stick around for week 4 of this course, but only to see whether this can get any worse. After a somewhat promising start, this has been the only really disappointing coursera course I've followed so far - It sure seems like a money grab by the University of Illinois. I have to say I'm extremely glad I did not go for the "Signature Track" and pay any money for this uninspired mess.

Pattern discovery in data mining - Week 2

I've updated the github repository to include some code related to week 2 of the coursera course Pattern Discovery in Data Mining. However, this is only a few lines of R code in order to calculate chi², lift and cosine measure for a contingency table of items. All of this week's answers could have been answered using pen and paper.

I am not a big fan of this quiz. Answering most of the questions amounted to revisiting the lecture slides and looking up the definition of the relevant section. This is made harder by the fact that some of the definitions are confusing and the notation is not really well explained in all cases.

In any case, I did alright on the test, but I definitely hope there is more actual ata mining involved in the coming weeks.

Pattern Discovery in Data Mining - Week 1

Today's the final day of the first week of the first course in the new coursera Data Mining specialization - "Pattern Discovery in Data Mining".

This introduction covered a lot of ground, from a general introduction to transactional databases to frequent patterns and how to identify them (the a priori algorithm and FP trees were discussed). The lectures total only a bit over one hour, but this was definitely one of the more difficult first weeks of the coursera courses I have followed.

This is not only because the material is in itself quite dense, but also because the lecturer Jiawei Han, has such a strong accent that the course is sometimes hard to follow. I had to google some definitions to make sure I understood what's going on.

For example, when explaining the difference between closed patterns and max patterns, the slides (and Mr. Han) state: "Do not care the real support of the sub patterns of a max-pattern", which is not extremely helpful when one is struggling with the concepts anyway.

In general though the course is off to a great start - the selection of material is interesting and the quiz was just hard enough to be challenging but not so difficult as to be frustrating.

I will go through some of the code I used this week below (all code for this specialization can be found at https://github.com/phildeutsch/data_mining).

def frequentItems(items, tdb, n, s):
    itemsets = set(itertools.combinations(items, n))

    itemTransactions = []
    for i in itemsets:
        for k,v in tdb1.items():
            if set(v).intersection(set(i)) == set(i):
                itemTransactions.append(i)

    ret = []
    for k,v in sorted(Counter(itemTransactions).items()):
        if v >= s * len(tdb):
            ret.append([k, v])
    return(dict(ret))

After storing all transactions in a dictionary and creating a list of individual items, I defined a function which outputs all frequent itemsets of a given length n with minimum support s. This code first creates all possible itemsets from the list of unique items in the database. In the second step, each itemset is compared to every transaction in the database and recorded if a match is found. Finally, the function outputs all matches and the number of times a transaction matching the itemset was found.

Coursera Data Mining Specialisation

I've decided to upgrade my programming skills a bit and get deeper into data mining. In particular, I want to become more adept at handling transactional databases and text processing, two areas which come up frequently at my current job.

That's why the coursera specialization (https://www.coursera.org/specialization/datamining/20?utm_medium=courseDescripTop) came at exactly the right time. I'll be updating this blog with code snippets as I follow along. All code can be found at my github page: https://github.com/phildeutsch/data_mining.