Aug 21 2007
When windows are not enough
It’s not uncommon for “star” programmers to be an order of magnitude more productive than their colleagues. I believe that a large part of that productivity gap comes not from doing tasks faster, but from not doing them at all. Good programmers make the machine sweat instead.
I recently had a very unpleasant talk with one of our support developers. She was very annoyed, in a defensive mood, arguing that she is already too busy and cannot take on a bit more work. After twenty minutes of talking, she finally calmed down, and we came to the root of the problem. It’s not the programming part of her work that is killing her, but the fact that she has to go through several terminal servers to deploy changes to client environments, and that she has to merge all her changes into several revisions. And she does that for every single change, several times a day. So, in fact, she spent most of her time logging on to different computers and copying files between them. Although she understood the problem herself, the thought of automating those silly tasks never came to her mind.
It’s incredible how, for a profession based around telling machines how to perform dull and error-prone tasks instead of humans, so few programmers think about how to automate their own work. This is especially noticeable with the “Windows generation”, people who never effectively used the command line. GUI programs are, without a doubt, more user friendly and easier to use, but they are bloody hard to automate. When a task has to be done fifty times, or performed across the network, words “ease of use” take on a completely different meaning. That is the point when good programmers break from the usual practices and find better ways to do the job, and bad programmers stick to their old habits.
Breaking from the old habits
A few years ago, I did some consultancy work for a company whose application generated huge log files on clustered servers. To find the cause of any problem, programmers and support staff transferred gigabytes of data from client’s web servers to their computers in order to analyse them. Each transfer took at least 10 to 15 minutes. There was no special reason for pulling all the log files to local machines, people were just used to doing their job like that. I was even more amazed to see that even some “senior” programmers in that organization repeatedly (tried) to open monstrous log files in Notepad and than wasted the next 20 minutes waiting for their machine to unfreeze. All that just to find a few lines of text, which could have been done in a minute or two using findstr on the server.
When a solution clearly does not scale to the problem, we look for a better one, or a different way to implement it. The solution often requires using a different approach, or different tools. The same holds for non-programming tasks. We must look for better ways to perform daily tasks when the usual tricks don’t work. In this case, the proper solution was to use the command line. Programs like tail and grep were developed at a time when network connections were bad, and computing power was scarce, so they don’t require a lot of typing and clicking. Although available bandwidth and computing power have improved greatly over the last 20 years, such tools are still irreplaceable for processing large text files, especially on remote computers. Sadly, I meet more and more developers who never heard of those tools. If you are reading these lines and thinking to yourself “what the heck is grep“, here is just a brief introduction to some common tasks that these tools do really, really well:
grep
- get all lines containing a string (or a regular expression) from a large file - for example, get all log items for a particular session or user
- count the number of occurrences of a string (or regular expression) in a file - for example, find how many times someone has logged yesterday
- find all files that contain a string (or regular expression) - for example, list log files that contain a particular username
more
- view the start of a large file - for example, see the IIS log file format which prints column headers in the first line
tail
- quickly view the last few lines of a large file - for example, check out the last exception
- monitor how a log file is changing while other processes are writing into it - great for troubleshooting a live application, because it does not lock the file
awk
- extract lines from a large file and reformat them - for example build URLs by appending fields from a tab-separated log file to the site prefix, or create a block of SQL statements to populate tables from some data export file
- calculate statistics on file content - find top 20 client IPs by web bandwidth used and print the number of requests and total bandwidth used by each one
- split a file into several other files, based on line content - for example, extract multi-line exceptions from a log file
- quickly analyse a comma-separated, tab-separated or any other file format with multiple fields in a line
sed
- “replace all” for large files and sets of files
Working on Windows is often an excuse for not using command-line tools, but that is just an excuse. There are several free sets of tools which bring power to the Windows command line. For example, Cygwin package allows most popular Unix tools to run from Windows. Windows Resource Kit from Microsoft contains tail and qgrep (a fairly usable variant of grep). PsTools, another package now owned by Microsoft, contains gems like psexec that can run a program on a remote computer. Some tools, like GNU Awk have been directly ported to Windows by the GNU foundation.
On any variant of Unix, you can expect these tools to be already installed. Some trimmed-down replacements exist even for the standard Windows command line (like findstr which can be used instead of grep if you don’t look for regular expressions). However, most of these tools can just be copied to a remote machine and started (no need to “install” them), so you don’t require any special privileges to run them on client’s web servers, for example. You can just drop them on a remote computer and start using them. So don’t even try pulling the entire log file from the customers web site just to do a simple search: copy gawk there and process the log on the server.
In addition to being very efficient, these tools can easily be automated (scripted). It does not really matter whether shell scripts, batch files or some scripting language like vbscript, python or perl is used. Storing a grep or awk command line along with all the strange parameters will save you from the pain of writing them again tomorrow.
Three strikes, and you are out
“Don’t repeat yourself” is one of fundamental guidelines of Extreme Programming. Most good programmers today take care not to repeat the code, the ones that are even better find similarities in design and consolidate them, but very few apply the same rule to their habits or non-coding related work. I’m not just talking about half-hour tasks like sorting e-mail and analysing debug logs, but also small jobs like checking whether the client database is on-line or replacing seven occurrences of a field name in a script.
Apart from saving time, using replace all will also prevent errors, but you have to test it first before updating a 200 MB file on a client’s environment. The same tasks have to be applied for replacement in a 5 KB script file. There is always some overhead involved in automating the tasks, but when does it begin to pay off, and when is it unjustified? The rule I stick to is Roberts’ rule of Three (from Fowler’s Refactoring: Improving the Design of Existing Code), another version of DRY:
The Rule of Three by Don Roberts
The first time you do something you just do it. The second time you do something similar, you wince at the duplication, but you do the duplicate thing anyway. The third time you do something similar, you refactor.
This is also a good rule of thumb for tasks outside of coding. Read it like this: If you catch yourself doing something for the third time, chances are that you will have to do it again, so automate it. Use the replace all if you have to replace the same string more than two times. More importantly, stick to the rule when things get complicated! If you are not dealing with plain strings, but have to re-format the data a bit, don’t fall back to manual work: utilise regular expressions (if you don’t know how to use regular expressions, now is the time to learn). If this is the third time this week that you are doing the same replacement, don’t use the regular expressions manually, but write a script. When you catch yourself running that script every day (week, month) add it to the scheduled tasks list (or cron jobs).
Automating tasks is especially important when your client has to perform them. Having a nice automated installer instead of a two-page word document explaining all the steps required to set up the software goes a long way. It makes the task less error-prone so the customers will be happier, and it leaves us with more time for the fun stuff.
Image credits: D Lee/SXC
Add to Del.Icio.Us bookmarks


Great article. I think you really nailed it on the head with the “Rule of Three”. I can’t tell you how many times I’ve applied it, never mind thought it. Very useful.
Another good example is when you see a large project scale up and every developer needs to “setup” their individual machines with development tools. The first think I do is always create a “devInstaller.exe” that will silently install all the tools you’ll need. This could be the IDE, ftp clients, etc. You name it. If you spend one day to do this, you will save in the long term. Just another example…
{shameless plug}
There is also a commercial grade of tools designed for windows developers called MKS Toolkit for Developers. This product contains over 450 UNIX and Windows command-line utilities that would help any developer become unbelievably more productive.
{/shameless plug}
It strikes a nerve when I see sentances like
“ This is especially noticeable with the “Windows generation”, people who never effectively used the command line ”
Failure to use the right tools is due only to a lack of research, whatever your preference is for gui or commandline. There’s very little you can do at a commandline that you can’t through a gui.
Also for a post on the benefit of shells, where was powershell?
Hi, really good article. Microsoft brought out PowerShell which is amazing. If you know the .net library, you have a programming language in a shell, amazing what you can do with it.
Good article. I would add two more Unix utilities: diff and patch.
Actually having used Unix in my younger days I would still have to agree with both Chad and James, ignoring a tool such as PowerShell just shows a sheer ignorance on the writers part, PowerShell today, is leaps and bounds ahead of most Unix Shells, the extensibility and ability to quickly stich different data and functions together makes this the absolute killer tool.
The writer does have a point about the “windows generation” though, a lot are simply, if not ignorant, then semi unaware of how strong a good shell with proper tools can be for much of the day to day routine work that still exists.
“There’s very little you can do at a commandline that you can’t through a gui.”
I don’t buy that at all. As the article said, the thing a command line gives you is the ability to trivially automate absolutely anything - if you can run the command at a command line, you can save the command to a shell script and execute it automatically. Doing that in a GUI, if possible at all, is much more involved (which often discourages you from doing it at all).
sed, gawk, grep, less and a couple of others for Windows:
http://unxutils.sourceforge.net/
I have the same opinion regarding, rules of three,
i can’t count how many times that rules save my work
Yes, there’s an over-reliance on GUI interfaces. Ubuntu is trying this with Linux currently. There’s nothing like combining a few commands on the command-line into a shell script. I will check out PowerShell, but does anyone here know if it’s able to pipe things between commands as it’s done in *nix operating systems?
Hahaha, sucker. Like a hypocrite, you don’t apply the advice you give to yourself. How many times have you had to tell windows programmers to automate stuff by switching to command lines? Well, why don’t you automate THAT task? Oh right, in order to do so you would have to create a graphically programmable GUI. Now that’s some fairly cutting-edge stuff and so probably beyond you. But if you just *realized* what needs to be done, you wouldn’t be dispensing your idiotic advice (switch to the CLI) as if it were holy wisdom. Instead you would know that switching to the CLI is an ugly kludge and you would feel embarrassed at having to propose it. The irony is that by your own standards you are a bad programmer.
I personally don’t want a .NET-ified shell: if I want a programming language I can scripts with, I’ll use Python, Perl, Ruby, etc.
I think that level of integration to the shell is a really Bad Idea™, and is one of the worst ideas Microsoft has had.
Tying the shell that deeply into a framework is singularly stupid, in my opinion, because it now *requires* the user to know the framework (which MS changes on a semi-frequent interval), rather than knowing the shell, and then writing a script/tool in a scripting language: which is why they exist.
I’d like to see more user interfaces for programming utilities provide the equivalent of an Excel macro recording. It’ll lower the learning curve for people struggling with repetitive tasks. An example is one of the open source SCM GUIs, which logs the command-line that was run to a little window below. One minute you’re just a worker drone and the next minute you instinctively know how to automate the task.
[...] [CODE] When windows are not enough (gojko.net, 39 saves, 3 inbound links, 4 diggs) [...]
Two things. (1) There has to be a more meaningful picture for this blog entry than the one you chose. (2) My friend added command line interface support to his open-source web framework, because he specializes in dealing with clients who need “Practice Management” software. He noticed that the system he was replacing were remarkably efficient, text-based DOS system. Read his comments here: http://sourceforge.net/forum/forum.php?forum_id=720938
@Chui Tey
@ “I’d like to see more user interfaces for programming utilities provide the equivalent of an Excel macro recording.”
This is exactly what we don’t need being a top priority feature in most user interfaces. Many user interfaces simply need an “Undo” button. I can’t tell you how many people ask me if I could somehow add an “Undo” button to their task. I’ve seen cases where a properly designed “Undo” button could increase productivity by 2 hours per day! Honestly, even as a power user who rarely touches the mouse and memorizes hundreds of keyboard shortcuts for dozens of applications, the one thing that ALWAYS manages to slow me down is an improperly implemented or non-existent undo button. People make mistakes and they need ways to “erase” them. Incidentally, the “Undo” button is one of the favorite ideas of the famous and deceased Jef Raskin.
Take a look at http://devtools.cedarsoft.org. Made from a lazy developer for lazy developers… This command line interface is made to get your daily tasks get done really fast. Released under GPLv3.
At the moment Maven and Subversion tasks are implemented. More to come…
I use textpad to do all my programming in windows, and it does every task you mentioned there. Plus I can program and run macros in seconds to automate tasks that a command line couldn’t. There are many more powerful text editors like textpad that will do the same.
Thanks for the good content. Odd how vituperative some people are.
Keep up the good work.