Track when you lock your desktop

a stopwatch
Photo CC-BY by Jerry

If you do any sort of work where you have to report your activities in 15-minute increments, you have probably experienced that it is very easy to miss things, especially if you often have to deal with interruptions.

There are numerous solutions to parts of this problem such as apps which continuously poke you to update a tracker, to ones which simply log everything and sum it up. I use some of them but I also add another check to my routine to have more accurate timing information when I wasn't paying attention. What I'm doing is writing timestamps to a file which register screen locks/unlocks as well as suspend and it has in the past year helped me to significantly become more accurate in my tracking by having reference timestamps such as the following excerpt:

Thu Jan 23 09:53:48 CET 2014: SCREEN_UNLOCKED
Thu Jan 23 10:02:21 CET 2014: SCREEN_LOCKED
Thu Jan 23 10:18:16 CET 2014: SCREEN_UNLOCKED
Thu Jan 23 10:27:02 CET 2014: SCREEN_LOCKED

If you want to do this, too, just follow one of the examples below. If you want to do something similar and customize it, the problem is rather trivial if you break it down into a trigger and an action. The first is heavily dependent on your environment, the latter could be anything.

Linux/GNOME

To react to events you'll have to run a command at login which then continuously monitors DBUS to see if the relevant event has fired. Here is a working example:

$ #!/bin/bash

function write_log {
  if [ -z $1 ]; then
    1="unspecified"
  fi
  echo -e "$1\t$(date)" >> "/home/youruser/time.log"
}

write_log "LOGGED_IN"

# Monitor gnome for screen locking. Log these events.
dbus-monitor --session "type='signal',interface='org.gnome.ScreenSaver'" | \
  (
    while true; do
      read X;
      if echo $X | grep "boolean true" &> /dev/null; then
        write_log "SCREEN_LOCKED"
      elif echo $X | grep "boolean false" &> /dev/null; then
        write_log "SCREEN_UNLOCKED"
      fi
    done
  )

OS X

The only decent trigger method I found (albeit after only 15 minutes of googling) is the paid app Scenario. The app itself should be self-explanatory and allows you to put one or several scripts into the respective folders for sleep, lock, log out, etc. Since I'm much more familiar with shell scripting than Applescript, I'm just using the later to call the former. Obviously, this could be simplified.

Applescript in Scenario:

do shell script "~/.bin/screenlock.sh unlock"

Shell script:

#!/bin/bash

if [ $1 = "lock" ]
then
  echo -e "`date`: SCREEN_LOCKED" >> ~/time.log
elif [ "$1" = "unlock" ]
then
  echo -e "`date`: SCREEN_UNLOCKED" >> ~/time.log
else
  echo -e "`date`: UNKNOWN_ACTION" >> ~/time.log
fi

 

Have a better way to do this? Let me know in the comments.

Running Drupal on a proper PHP version with IUS

I recently read Thijs Feryn post on the maturity of PHP and he makes a convincing argument for actually using what PHP provides with 5.5 now being stable.

Drupal 7 has seen many improvements over the past year in regards to 5.4 compatibility and as it turns out, it now runs pretty smoothly, though many modules still create E_NOTICEs due to legacy code. The breakage which 5.4 caused in many places when Drupal 7 (and let's not even talk about a 6 site) came out is not comparable with upgrading from 5.4 to 5.5, which is mostly harmless.

Managing PHP versions

You have basically two options for upgrading PHP on your server: find some pre-compiled packages or build from source. The latter should be avoided by nearly everyone, unless you're willing to diligently subscribe to a release mailing list and compile again and again.

On CentOS/RHEL, which is my preferred hosting platform at the moment one can rely on the excellent IUS repository. It makes switching between 5.3, 5.4 and 5.5 extremely easy. Once you have IUS installed, you can select your packages via the name php5Xu. Which means you can switch between 5.3 and 5.5 simply by removing the packages installed from php53u and installing again with php55u. All I needed afterwards to get PHP running again was copying the /etc/php-fpm.d/www. conf.rpmsave over www. conf and so 15 minutes later this blog is running on 5.5:


# yum list installed | grep php
php55u-cli.x86_64 5.5.5-2.ius.centos6 @ius
php55u-common.x86_64 5.5.5-2.ius.centos6 @ius
php55u-fpm.x86_64 5.5.5-2.ius.centos6 @ius
php55u-gd.x86_64 5.5.5-2.ius.centos6 @ius
...

An additional benefit is that IUS provides packages in sync with the latest released package. This is very helpful in determining what you are actually running. For example, your server might report that you are running 5.3.3 (as stock CentOS 6.4 does), even if critical fixes have been backported by RedHat, Fedora, et al from later versions, with IUS you'll get a proper 5.3.27.

Caveat: Of course, don't try this on production if you don't know what you're doing.

Sterling's Dark Euphoria

Photo CC BY/SA by Matt Biddulph

Bruce Sterling has struck again. At the Webstock '13 conference he gave a talk (as he is prone to do) in which he looked at the web and society in 2013 and produced a fascinating narrative around it.

Stacks

The first important point I took away from his talk is how critically divergent the global online companies we depend on have become from traditional 20th century corporations as well as from the open web itself. He calls these vertically-integrated, global organizations stacks and identifies key factors in their difference to the web, such as a a proprietary operating system and a post-Internet, non-jailbrakeable stack-device (tablets, phones, e-books, et al) and much much more. How does he put it?

The Internet had users, stacks have livestock.

Interestingly, the stacks are highly unstable, he says, and he believes they shouldn't outlive the Arpanet in the long-run, proves it by cat, and you feel a kind of dark euphoria for what's to come.

In sum, just go watch his talk, I couldn't fit this into 140 characters.

Webstock '13: Bruce Sterling - What a feeling! from Webstock on Vimeo.

A commercial marketing database lets you peak inside

aboutthedata's categories
Screenshot of aboutthedata.com's categories, not covered under blog license.

There is an old entry in my backlog of ideas for the blog which went along the lines of try to get a copy of what data mining marketing firms have on file for you. Nothing ever came of that until I saw Ed Felten's post on one of those companies doing just that, with a spiffy site to boot, and I'm really surprised about this.

A quick primer

Most everyone knows that companies as a whole are tracking purchasing habits and customer preferences and correlating them with socio-demographic information to sell you more stuff. The who and what is much less widely know though.

Most often these collection companies are specialized businesses, which sell subscriptions to their databases, which they feed from multiple sources, especially in the U.S. private as well as public sources. At the time I jotted down a few notes on the topic ChoicePoint was one of them.

Most of these companies do not simply stop at providing information on marketing data to business and government (yes, the U.S. government is not prohibited from using private databases, even if they are not allowed to create those on citizens in the first place). They often have a close relationship to data based on or provided by credit scoring agencies, such as Altegrity, TransUnion or ISU; which are listed as direct competitors of ChoicePoint by Hoovers. The differences in their aggregate datasets are probably small, the impact of the result on individuals often not. ChoicePoint was later acquired by LexisNexis, which primarily focuses on publication databases,  and some divisions by Acxiom. It shows the further agglomeration of databases of different types within these data brokers.

Why aboutthedata.com?

If I had been asked a week ago, I would have concluded that such data brokers have no interest in providing a site such as aboutthedata.com. It seems intuitive that such brokers would want to collect as much data as possible on their subjects without giving them reason to share less data with them. Ideally, to not be noticed at all.

This could be an initiative of Acxiom to improve the public perception of data mining firms such as themselves. It could be an attempt to preempt any negative coverage on data brokers due to the continuing public discourse on surveillance et al. Or it could just be an attempt to further improve their data.

Let's try it out

Since I did spent several years in the U.S. I figured that they should have at least something on me. I entered my personal data from my last residence and was surprised to find that in the categories shown above they had basically nothing, except for an inferred marital status and income bracket, which matched, but that would have been possible to extrapolate from the address and age itself I entered to register. Ed Felten was more successful but also apparently underwhelmed by the level of detail shown.

My speculations on why this is so fall into two basic groups:

1. They are only showing a minimal set

It's possible that Acxiom is only providing their basic result data and keeping any other further (and more creepy) analysis results to themselves and their clients, but since that's pure speculation I'm going to ignore this avenue for now.

Also, I have become in general skeptical of the predictive power of big data in shaping or determining customer behavior, which is in my opinion often at best on par with a trained sales representative.

2. I didn't provide enough data

Two of the primary categories provided by Acxiom are basically public databases with vehicle and house ownership, neither applied to me at the time. Also, I generally did not sign up to loyalty cards. It's possible and likely that Acxiom then either really does not have much more information on myself or that they are unable to merge incomplete datasets on my person into a consistent record.

The latter case highlights the main problem with abouthedata.com:  Basically, I can be pretty certain that I'm not seeing the full list of entries Acxiom has on me, simply due to the fact that a human operator would have to make a judgment call, whether a fragment refers to the same canonical person or not. They are unlikely to be able to manually merge a significant number of entries, which would mean that there are have to be significant numbers of entries in their database which they cannot bring together in this web application by algorithm alone. They cannot ask me "is this you as well" without accidentally disclosing data from someone else in many cases, they have to err on the side of caution, more so than is probably necessary for most of their customers.

Thus, I'm still left wondering what the site is supposed to accomplish. Calm me? Improve the paper spam selection? Not sure.

Bulk create searchable PDF from paper documents in Linux

For some documents you have to retain the original in dead tree storage format. For most documents which arrive in the mail, however, a digital copy is just fine and there really isn't any need to retain the paper version, especially if your computer can store millions of them in the space needed for one paper binder.

To archive such documents one could now buy a scanner, maybe even an ADF office appliance which spits out a searchable PDF and all that and let the device collect dust for the next few months until another batch is processed. However, you can also achieve nearly the same result with your phone and a Linux desktop.

Step 1: Capture

Firstly you will need to actually digitize your documents, you can of course use any scanner for this but a phone can be the perfect device to quickly capture dozens of documents, often vastly faster than with a flatbed scanner optimized for photos.

I personally am using the Scanbox to do this but any contraption which can hold your phone or digital camera steady such as a tripod mounted at similar distance to the document should do the trick. Speed is the important factor in my solution, not accurately capturing 6pt legalese in the footer.

Step 2: Format

After capturing you might need to rotate your photos first to get the page into portrait mode. Watch out though that the generic image preview in Gnome might rotate on-the-fly from EXIF data and not tell you. If you were to OCR those files, you would not get any text from it. You can check if your files still need to be rotated by opening them in GIMP, which will ask you if you want to rotate. You can bulk-rotate according to the camera setting with:

mogrify -auto-orient *.jpg

If your camera orientation did not match your document orientation, you'll have to convert by hand, the latter will do so 90 degress clockwise:

mogrify -rotate 90 *.jpg

Step 3: Process

Now you are ready to convert your images to PDF with text in them, basically, all you need to do is to call hocr2pdf and tesseract, the rest of the script below is only concerned with naming things and cleaning up. Thus the packages tesseract-ocr-eng, imagemagick and exactimage should be all that's needed on Debian-based systems, it worked flawlessly for myself with Ubuntu 13.04. Essentially, it's a cruder version of Konrad Voelkel's solution.

#!/bin/bash
for f in *.jpg
do
  localname=$(basename "$f")
  filename="${localname%.*}"
  tesseract $f $filename -l eng hocr
  hocr2pdf -i $filename.jpg -s -o $filename.pdf < $filename.html
  rm $filename.html
  # I wouldn't do this, but you could...
  # rm $filename.jpg

done

Et voilà, you have a searchable PDF which you can locate with the desktop search of your choice, for example Recoll.

Exporting Remember the Milk data to CSV

photo of a rtm shopping list
Photo CC-BY by johanl

Remember the Milk, RTM for short, is a popular web-based todo list manager. You can access your tasks in your browser, through an app and they provide a print template. Exporting the actual data is a bit more cumbersome: They consider iCalender to be the primary backup mechanism and the only other thing they provide is an Atom feed but the content is a bunch of <span> tags. Nothing you'd want to parse.

However, they also provide an API and David Waring has made use of it with his rtm-cli python script. With very little effort I was able to amend his script to include a function to create a csv file for all general RTM fields without notes. Since it's based on his ls function, you can use rtm-cli's general filtering functions. If you wanted to export all unfinished tasks you would execute the following:

$ python rtm -c csv

It would create a UTF-8 file output.csv in your working directory with a structure as such:

ID;List;Title;Completed;Priority;Due date;Tags;Link;
123;"Inbox";"Error reporting broken";;N;;;"http://www.example.com";
124;"Inbox";"Login problem";;N;2013-06-14T00:00:00+02:00;"mysql,apache";;
125;"Inbox";"Add web service";;N;2013-07-21T00:00:00+02:00;;;
126;"Inbox";"Fix IE8";;N;;"ie";"http://www.example.com";
 
You can get the code through a forked repository. If this exporter worked for you, you could let the maintainer of rtm-cli know by choosing Approve in the pull request. 2013-06-24 Update: The maintainer has accepted the pull request.
2013-07-09 Update: If you encouter problems like Tyler with a missing rtm module, make sure you have pyrtm installed (e.g. pip install pyrtm).
 
Let me know if notes is something you'd need to export, too.

Pages

Subscribe to Front page feed