Coping with non-versioned files whilst collaborating

June 3, 2008 14 Comments

Subversion is great! There, I said it. But how can we collaborate on one and the same project when some files (such as useruploads) are not versioned and are scattered amongst the developers their filesystems/machines?

How we use SVN

At work, my colleague Tijs and I started off with a new project this week. As me and my colleagues do with all projects, we each have a local instance of the website running on our own machine (viz. local Apache & local MySQL) under a fake url http://projectname.local/ (long live Apache vhosts and Windows hosts file manipulation). By this we can develop features separately and eventually commit the changes to the repository when finished. As we update regularly the new features are rolled out onto each dev machine per update. Only data that isn’t versioned are the database and the user uploads.

Unscattering the data

As Tijs is the main lead on the database design for this specific project (and needs to update/extend the structure every now and then) we decided that I’d connect to his database, making sure that both of us are using the latest DB schema and same testdata instead of exchanging .sql dumps every now and then.

Only problem we ran into is that some of the modules we are writing require uploads: When I add a new entry (through my local instance of the site) the data indeed gets nicely stored into Tijs’ database, yet the files get stored on my local hard disk (in the files/ subfolder of the website). Time to start fiddling around in order to get all data onto Tijs his Mac, yet let me continue developing locally.

The Plan

The plan we rolled out consists of three steps:

Enabling me to surf to Tijs’ instance of the project
Redirecting all HTTP requests to uploaded files on my machine to Tijs’ machine
Moving all uploaded files from my machine to Tijs’ machine, so that the HTTP requests from 2. actually resolve to a valid (viz. non-404) location.

1. Enabling me to surf to Tijs’ instance of the project

This one is actually very simple: Tijs configures a vhost named projectname.tijs on his Apache and I edit my hosts file so that projectname.tijs points to his IP-address.

2. Redirecting all HTTP requests to uploaded files on my machine to Tijs’ machine

In order to redirect all requests to files located in the files/ subfolder from my machine to Tijs’ machine, we’re using mod_rewrite. One extra line in my local .htaccess file and I’m good to go:

[code]RewriteRule ^files/(.*)$ http://projectname.tijs/userfiles/$1 [R,NC,L][/code]

That little rule above – for example – will redirect a request to http://projectname.local/files/artists/22.jpg to http://projectname.tijs/files/artists/22.jpg yet keep requests to http://projectname.local/anythingbutfiles/ pointing to my local instance.

3. Moving all uploaded files from my machine to Tijs’ machine

First of all, Tijs shared the folder where his instance of the project is running and I mounted it as Network Drive to which I (quite obviously) assigned T:\ to.

Then I decided knocked up a little batch file which copies all the files from my machine to Tijs’ machine by using the xcopy command, allowing me to recursively copy data via the /E parameter.

[code]xcopy files T:\default_www\files\ /E /C[/code]

Now, as I don’t want to copy all files each time to Tijs’ machine, I extended the batch file with a second line to delete all my local files (recursively and without confirmation) so that when no new files were adding, nothing gets copied (as the folders are empty).

[code]del files\*.* /S /Q[/code]

All one has to do now is to run that bat file every now and then or – if you’re quite lazy – push it into the Windows Task Scheduler to call it let’s say every 5 minutes.

That’s it, we’re done here!

Published by Bramus!

Bramus is a frontend web developer from Belgium, working as a Chrome Developer Relations Engineer at Google. From the moment he discovered view-source at the age of 14 (way back in 1997), he fell in love with the web and has been tinkering with it ever since (more …) View more posts

Join the Conversation

14 Comments

Tom Klaasen says:

June 4, 2008 at 7:12 am

Seems like a dangerous development setup to me 😛

What happens when Tijs goes on holiday? How can you be sure that what you deploy, is the same as what you developed?

At our company, we believe very strongly in repeatable deployments, and in repeatable development installations. If tomorrow another developer is added to the project, she can have the exact same setup as on our machines in no time. And she even won’t need to access on of our machines.

The same goes for our deployment environments: a typical deployment takes 1 to 10 minutes, with no (as in: zero) human intervention.

It just makes me feel comfortable 🙂

Reply
Tijs says:

June 4, 2008 at 8:52 am

@Tom: When I am on holiday, Bramus will find all my code on SVN. The files that are copied to my PC are not crucial. Just images. So no worries.

Reply
Erik Bauffman says:

June 4, 2008 at 9:22 am

Tom, please note that this is not the way we usually work. Circumstances are different for this project and it seemed cool to try and work this way. This is not one of those things we’re going to be doing again anytime soon 😉

Reply
Bramus! says:

June 4, 2008 at 9:32 am

@Tom: If Tijs goes on holiday I’ll know. If he drops ill/sick, I’m screwed (but I know the pass to his Mac, so it wouldn’t stop me from developing ;)).

As an extra, he every night exports the DB and userfiles to the online version (viz. upload via FTP to the SVN server and drop the files – unversioned of course – in the correct folders + dump the local .sql schema and import it on the online DB).

Now, to take the dangerous part out we could set up an extra copy of the site on some (internal) server, both connect to a database on that server and both copy our files to that server. Yet, we’re kinda doing that already, by using Tijs his machine as the server 😉

Reply
Tom Klaasen says:

June 4, 2008 at 12:35 pm

@Erik I’m glad to hear that 🙂

@Bramus! It’s not only about Tijs’ password, of course. Do the test: take a new machine (even with your favorite IDE and everything installed), and try to get the project running on it. Note down the number of times you said ‘oh, looks like I forgot to do that tiny, very easy to do thingy. And now _that_ little thingy. And now …’ Really. Not in your mind, but for real.

Oh – and a developer’s computer (something somebody is really working on, and doing experiments on) is not a substitute for a development server.

Don’t understand me wrong: I’ve _been_ in your situation. And I’ve taken the _same_ decisions. And it _has_ fired back at me 😛

Reply
Dieter says:

June 4, 2008 at 5:39 pm

At Marlon, we are using the SQLYog tool to sync the (mysql-)database on different machines. All code is available through SVN.

SQLYog offers an structure synchronization tool and a data sync tool. Never been easier/faster!

Reply
Bramus! says:

June 4, 2008 at 7:35 pm

@Tom: the process of putting up a new site on a local machine has been semi-automated on my pc (3 manual steps + 1 call to my fork-setup.bat file and I’m good to go). Could do it all automagically (viz. skip those 3 steps) but am not planning on extending my .bat-file as I’m the only developer on a PC (left) and those 3 steps are no biggie at all 🙂

Above that the dataset we’re working with can in no way be compared to a real dataset. Now we have about 20 entries in each module. The live site will presumably hold 2500K+ results per module (at launch). Once we’re in the final stages towards the first release, we’ll revert to our normal behavior: develop locally, commit changes to prerelease version of the site (to which the client has access to) and – if needed – modify online DB schema & upload new files to online version.

Could you elaborate on what exactly fired back? Can’t think of anything right away where we don’t have 3 (both on- and off-site) copies of. Above that all files (except for the useruploads, hence the goal of this article) are stored in the SVN repository, which gets backed up by Netlash’s hostingpartner (viz. Openminds).

@Dieter: aha, need to check that one … but is there a Mac alternative for my colleagues?

Reply
Tom Klaasen says:

June 5, 2008 at 9:06 am

@Bramus Every manual step that one has to take, fires back at some time. We’re only human, after all. So reduce the number of manual steps 🙂

Secondly, differences in infrastructure tend to generate unexpected results. It is impossible to duplicate a deployment setup in a development environment, but one should try.

The idea of switching behaviour is understandable, but dangerous as well. Why don’t you implement the ‘normal’ behaviour from the start, and make it as lean and mean as possible? I guess the only reason you execute these Apache hacks is because there is no easy alternative… Doing it ‘the right way’ from the start, will force you to make the alternative easier to execute. And it will make the end result more fault-proof, because the deployment process itself has been tested (and executed) multiple times.

I agree that backups are a safety net, but they’re just that. A net that prevents you from getting hurt. But it doesn’t prevent the failure of the show 😉 A website that’s down for 30 minutes right after the (widely announced) go-live, because one forgets to change a setting in a configation file, is hurt.

Reply
Bramus! says:

June 6, 2008 at 3:38 pm

Tijs just left. I am still at the office. Once single change in my hosts file (pointing projectname.tijs back to my IP) and I’m working again 🙂

Reply
Andrew Flusche says:

June 20, 2008 at 3:22 am

I loved Subversion when I was doing programming for a living. But now that I’m a lawyer, we don’t have a kick-butt version-control system. There are document management programs, but nothing as good as flexible and great as Subversion. *sigh*

Reply
balder says:

June 25, 2008 at 2:26 pm

Why don’t you put up a simple dev-machine where you run the test databases/site that both can access. If you worried to screw something up, just use a replica each time there is a big change.

And what if a third developer comes in to place, and a fourth and…, will they all get the man’s password in case he gets ill?

Reply
Bramus! says:

June 26, 2008 at 7:22 pm

@Andrew: I feel your pain mate!

@balder: a dev machine would a good idea indeed, yet we would still need the steps laid out above as all of our custom files would need to make it onto that dev machine and be callable over HTTP 😉

Now, we could take our SVN server for this at work, since our hoster provides us HTTP access to the SVN tags. And this is actually something we do: develop locally, and then finally upload the files via FTP onto the SVN server. By this our clients – if we give them access to the development version of the site – can see all our commits + our test data (and test files) online 🙂

I know, it might sound weird but I have found this a really really good way of working. Just think of the SVN server as an internal machine … maybe that will make it easier to get a hold of the bigger picture after our way of working 😉

Reply
Aaron Bassett says:

July 10, 2008 at 9:32 am

You get to use version control normally!? I’ve been trying to convince the tech director here that we really need a version control system. But so far no dice.

It normally doesn’t cause too many problems as on most projects the vast majority of the time there will be only 1 developer working on it and never work on the production environment always in staging – but still it’s not a great system.

Thinking of trying to get him interested in Git, what source control would you recommend?

Reply
Bramus! says:

July 10, 2008 at 12:57 pm

@Aaron: at my previous job I had been requesting to be given time to set up SVN … after a year of whining I just gave up as my boss just wouldn’t see the advantages in it.

So we just continued working with our files stored on the shared project server with all problems that come with it: working on the same file (and saving going wrong), accidently deleting a file (and no way to recover it except for the overnight copy that was made), overwriting the server copy with a local copy (and no way to go back), nobody claiming that they broke a file (and no way to find out who did it), etc.

I’m glad that – at my current job – we use SVN, it not only lets us collaborate on projects more easily; It also lets us dig up old versions of code (no need to manually zipping up a project) and see who did what (even on line level). Really, push your tech director to set up some kind of versioning, the juice is worth the squeeze (viz. the effort in setting it up is worth it)!

We’re not using it as it should be though as all the sites produced by us are built on our CMS which is constantly being updated (viz. the trunk). We do indeed create new branches (1.1, 1.2) from time to time, yet our tags are no real svn tags: instead of sub versioning a branch to – for example – v 1.2.1 each project/site we create is a tag (in a normal scenario one would set up a new project from a certain tag).

By this we can use the same repository for all our projects, as they all rely on the same codebase. Only difference between them is the layout, and some projects require some extra modules. Above that all our tags are accessible over HTTP via a subdomain of our svn root url, making all changes directly visible on the temporary url before the site goes live 🙂

About GIT: Haven’t worked with it yet. Heard some great things about it though: It’s supposed to be “SVN on steroids” or something like that 🙂

Reply