Skip navigation.
Home
Freeware and Free & Open Source Software for Proprietary OSes

Adventures Upgrading from PostgreSQL 8.2.6 to 8.3

PostgreSQL 8.3 was announced today (Feb. 4). As a Pgsql newbie (I started testing it less than two weeks ago), I thought it would be a useful experience to see what the upgrade process looked like. MySQL (which I have years of experience using) is pretty much a snap to upgrade (I've been upgrading since the 3.x days).

FYI: I've been testing PostgreSQL on a Linux box. But, my experience should be common across platforms. I've been installing from source code to get a good feel of the system. Configuring, compiling, and installing 8.3 was straight-forward. The shock came when I restarted the server and read the error messages:

FATAL: database files are incompatible with server

DETAIL: The data directory was initialized by PostgreSQL version 8.2, which is not compatible with this version 8.3.0

Reading more carefully through the release notes, I read: A dump/restore using pg_dump is required for those wishing to migrate data from any previous release.

This was unbelievable to me. MySQL had required some privilege table changes/fixes and a few other things a couple of upgrades ago. But, I never had to dump and reload my entire database server. And, yet, this is what PostgreSQL forced me to do. Since I am just testing, this was not a huge task. But, what if I had hundreds of databases containing many gigabytes of data? This process would take forever.

The other thing I ran into is that the generic SQL dump (pg_dumpall --clean --column-inserts --verbose > db.out) would not restore (psql -f db.out postgres) at all. The PostgreSQL specific dump (pg_dumpall > db.out) did, however. And, fortunately, I had produced one data dump of each type. However, I also learned that PostgreSQL doesn't recreate the databases during the restore process. I had to manually create each database before restoring the databases' contents. MySQL restores never forced me to do that.

After upgrading and restoring my databases, the Webmin PostgreSQL module and phpPgAdmin could both see and manage the restored databases. However, the Windows client pgAdmin III 1.8.2 (released on Feb. 3) couldn't connect to the pgsql server on the Linux box. It had been working before the upgrade.

I'm going to continue exploring and learning more about PostgreSQL as an insurance for the possibility that Sun's acquisition of MySQL creates problems down the road. But, I think I can see why MySQL is so much more popular than PostgreSQL despite pgsql's richer feature set.

Just 2 commands or

Just 2 commands or what?

http://www.oneunified.net/blog/OpenSource/Debian/pgupgrade.article

I'm not a db admin, but I run

I'm not a db admin, but I run a company that uses postgres so I can comment from a business perspective. My team has spent several weeks trying to upgrade from 8.2 to 8.3. Basically, my view is that Postgres shipped the equivalent of a Microsoft Vista. Microsoft can go on and on about what a big improvement Vista is to XP, but the fact is, it's too damn hard to migrate and too many things that were broken by the "upgrade." I really don't care that it's faster or that it has lots of cool new features. If I want faster, it's a lot cheaper to buy faster hardware. If you break our web app in 500 places that have to be manually fixed, you've shipped a three week hassle, not an upgrade. And now we're being forced to change to keep our OS up to date. Same thing happened when M$ tried to phase out XP even though anyone with an IQ over room temperature prefers XP to Vista.

I'm willing to bet that the postgres groupies here who are making light of the migration process have never actually developed a large database application and have never actually spent time actually "upgrading" an 8.2 app to 8.3. After hours spent fixing these problems, the sexiness of the new features and speed of 8.3 tend to wear off rather quickly. The result will be that postgres users bolt to mysql, just in the same way M$ users are bolting to *ANYTHING* but Vista.

I had a similar problem. I

I had a similar problem.
I had once version 8.0 installed on my machine, but I never used it. Now I wanted to start using it and upgraded to version 8.2. I did excactly as it was described on postgresql manual, except I didn't do the dump, since I had nothing importaint in the database. And now if I want to start the postgresql console I get the same error as the author of this blogpost did. Before the upgrade I deleted the /var/lib/postgresql/data directory but now after running initdb it still knows that the previous version was 8.0 and psql won't start.
What else should I delete to get the new postgresql to forget the previous version database at all, and that initdb would create a whole new database?

kri: Sounds to me like pgsql

kri: Sounds to me like pgsql is looking at one of the old conf files. But, I'm a pgsql newbie. So, I don't claim any special knowledge about upgrades. Two quick suggestions: (1) As Jeff suggested to me, you might want to try the pgsql-general mailing list. (2) I suggest checking the pgsql FAQs first though. I've been avoiding mailing lists myself because some have been difficult to get off of and even the digests get too annoying after a while.

Some people learn the hard

Some people learn the hard way that docs exist to be read.

The most important lesson is: next time, paying attention to the commands used instead of following some "Internet recipe" and reading the docs will save you a long time and a lot of unnecessary embarrassment.

Hey, "anonymous", if you

Hey, "anonymous", if you would learn to read posts instead of simply launching pompous anonymous insults, you would know that I read and quoted from the installation instructions (I downloaded the entire 8.3 PDF manual). The "official" 8.3 instructions were the basis for both pgsql dumps. The fact is that the generic SQL dump comes directly from the instructions. The dump itself worked but pgsql could not restore from that dump. In other words, following the instructions did not work. Quite frankly, it is people like you that keep people away from trying PostgreSQL. Good work.

What do you mean "would not

What do you mean "would not restore"? The "--clean" option is for removing pre-existing data in the database, so if you're restoring into an empty database it makes sense that you'd get errors like "_________ doesn't exist", but that doesn't mean your data wasn't restored. What actual errors are you getting?

I agree that dump/reload is not an ideal solution. There has been some significant effort toward in-place upgrades, and I think it's a possibility in the near future. In the meantime, you have to use something like Slony if you have a lot of data (and that's not an easy process either).

Why did you have to create each database again? That's the point of pg_dumpall. pg_dumpall will recreate all your databases, users, and everything else, while pg_dump is the one that only dumps a single database. Again, without more information it's hard to figure out what went wrong (if anything).

Also, post things like this to pgsql-general. You will no doubt get all your problems resolved very quickly, and it may lead to better documentation so that other users don't have similar problems.

I'm pretty sure there is an

I'm pretty sure there is an option to pg_dumpall that tells it to include the create statements for existing databases in the output. This would save you having to manually recreate them before loading from the dump.

In MySQL and other SQL

In MySQL and other SQL database servers, a equivalent to "clean" is a precaution in case there is an existing database that you want to replace. However, if the database does not exist, the restore just goes ahead as expected. In other words, it works both ways. Here's the MySQL use case: You make a regular data dumps and have it include the equivalent to clean. If you restore a database back to the same server, it would overwrite the existing database. However, if you restore the same dump on a new machine running MySQL, it would detect that the database is not there but continue on and restore the database. In pgsql, it failed when it did not detect a pre-existing database.

It would have been less

It would have been less painful if you'd have found the manual section on upgrading before going through this, but learning the hard way is certainly good for making any lessons stick.

pg_dumpall does create the databases for you; I'm not sure what issue you ran into but a regular old "pg_dumpall > db.out" starts by creating roles and then does databases, schemas, tables, and data. I'm not quite sure where you got the syntax for what you're calling a "generic SQL dump" because that's still plenty of PostgreSQL specific stuff in that format; INSERT-based dumps are not the best choice for version upgrades.

As for the larger issues here...anyone who has a database with hundreds of gigabytes of valuable data doesn't just apply a major update to the live database; that's crazy. What you do is bring up a new instance running the new version, test test test, and if everything works great only then do you consider switching to the new one. The PostgreSQL model is admittedly a little harder than it should be, but anyone who has a database with enough data that the dump/reload will take a while is less likely to find an in-place upgrade useful anyway.

The standard way to solve this problem for larger environments (my largest PostgreSQL database is 2TB) is to use the Slony add-on to replicate the data. You can bring up identical copies of the data in 8.2 and 8.3, for example, and keep them in sync until you're ready to swap to the new one.

As for MySQL, it's about 2.5 years since their last major production version was released, so it's no wonder people aren't used to feeling any upgrade pain there. It's adding disruptive new features that breaks software backward compatibility, and as long as they're not doing that upgrades are easy.

I have found that the dump

I have found that the dump restore for a major release is not that big of a deal as I generally have less databases with PGSQL vs MySQL, the reason being that PGSQL supports schemas. Now before you go off say that MySQL has schemas, they do sort of . In MySQL they implement schemas as seperate databases, on PGSQL a schema is a namespace in the same database,so it is like having multiple databases in one, so you end up with less physical databases.

I do wish however that the backup process was more server oriented instead of fat client oriented. It makes backing up from admin programs more difficult.
Ideally pg_dump should run on the server with a client API where you simply connect to the server and dump to your local PC if you want or just leave the backup on the server. The current implementation requires a separate pg_dump.exe for each version of the server you want to backup.

If you had called pg_dump

If you had called pg_dump with the proper options, restore would have created the databases:

pgdump -? reveales:
-C, --create include commands to create database in dump

And pg_dumpall recreates the whole engine with all databases.

Better to read release notes next time you try to upgrade :-)

My mileage with MySQL varies a lot: It was impossible to lift 3.23-type databases to 4.x or 5.x, because they changed the timestamp format between versions completely incompatible. Such a thing never happened to me with PostgreSQL (And I use it since more than 8 years quite heavily).

A dump/restore is normal

A dump/restore is normal practice when doing major-version upgrades of PostgreSQL - and yes, 8.2 -> 8.3 is a major version! This is flagged pretty clearly both in the release notes, under the heading "Migration to version 8.3" (near the top), and in the "upgrading" part of the installation guide.

Major-version upgrades are something you'd plan pretty carefully with any DBMS, scheduling downtime if necessary, or quite possibly (and painlessly) using Slony (PG's replication system) to do the data transfer for you if downtime is an issue. This is one of the scenarios which the Slony designers had in mind when creating it, and it works very well. If you do go down the dump-reload route, don't use the --column-inserts option either if the data set is large, as this is pretty slow; instead allow pg_dump to default to generating COPY commands for reloading data, which is really fast.

Regarding the PgAdmin connection problem, did you add a line to pg_hba.conf to allow the client machine to connect? Out of the box, PostgreSQL disallows connections from remote machines until you explicitly enable them by editing this file, which is very well documented via comments.

Finally, if you're new to PostgreSQL, it's well worth subscribing to the pgsql-general mailing list (see the "Community" section of the website) - the support and help that'll you'll get there is *unbelievably* good.

Ray: Thanks for your

Ray: Thanks for your detailed response. As I mentioned in the original post, I verified from the installation instructions that a dump/restore was necessary. The point is that this is an incredibly poor practice to adopt as a standard. It is incredibly time consuming to dump and then restore. Any database requires regular backups (a dump of some kind). But, MySQL, for example, does not require a full restore after a major backup (say going from version 4 to 5). It also doesn't make sense that PostgreSQL could not restore its own generic SQL dump or that it required each database to be reinitiated before it could be restored. The pgAdmin problem did indeed turn out to be a pg_hba.conf problem. This was probably because I rebuilt from source (instead of an RPM install) and it overwrote my previous 8.2 version. I assumed that the "make install" would work the way Apache httpd's does during a source code install and check for an existing conf file and use it instead of a new raw conf file.

Actually, the dump restore

Actually, the dump restore is pretty common with databases.

Where I used to work we were upgrading from Oracle 9i to 10g. You can bet there was a migration plan, and it took literally weeks to prepare for it. Dumping and restoring was a part of that plan. In fact, Oracle was far more difficult to upgrade than pgsql was.

Due to the way pgsql is made, it's unlikely that dump/restore for major version upgrades is going away anytime soon.

Note that it's considered best to dump from the version you're going TO, not FROM, so it's handy to have both versions installed at once.

On debian / ubuntu, it's easy to do and there's a command pg_upgradecluster that takes as argument the versions and the locations of the two clusters (a pg installation is called a cluster) and then initdbs the new one like the old one, and initiates a pg_dumpall from the old to the new and voila, you're migrated. Really, it's pretty seamless.

Hi Todd, The reason for the

Hi Todd,

The reason for the dump/restore cycle is that there are changes in the disk file structures between major versions. I'm not involved in the development of PG so I can't give you any more detail than that, but subjectively I'm more reassured by a product that moves on from version to version! :-) Yes, it can be time consuming for a large database (and I'll point out that by using --column-inserts you opted for the slowest possible option), but then if your database is large enough that it is a major issue you won't upgrade in this way anyway - you might use Slony, for example, to replicate your data to the new server until it is caught up (while the old one is still in production), then switch your web servers (or whatever) to point at the new machine.

For the record, PostgreSQL doesn't require you to recreate databases manually before restoring from a script produced by pg_dumpall, (except, maybe, if you use the --data-only option, but I've never tried this so can't say either way). If you open the output script in a text editor you ought to see a series of CREATE DATABASE commands, and then every so often a "\c " command so that psql connects to each database in turn. I know this isn't a support forum, but when you say it "didn't work", what happened exactly? The whole point of pg_dumpall, after all, is to backup your entire cluster in one go - databases, roles, etc etc, so it wouldn't make a lot of sense if it didn't recreate the databases! If you're having problems you should consider subscribing to the mailing list, as I mentioned before, where some frighteningly clever people will help you out. :-)

Finally, the source install didn't pick up on your existing pb_hba.conf because initdb creates entirely a new cluster and leaves your old one untouched. I know that I'd prefer it to do things that way; for example, when installing alongside an older version on the same box, you can verify that all is well before shutting down the old version and firing up the new. You can even run multiple versions alongside each other (as long as they listen on different ports) - indeed, you'd have to in order to use Slony as an upgrade path, as I mentioned above.

There is always some history

There is always some history to why software decisions are made one way for one project, but differently for another. I think the reason that pg_dump doesn't create the databases for you is that it is aimed more as a general backup tool. The ability to take dump output and reload it into any database, regardless of name, has been seen as one upside. Also this allows database owners to manage thier own backups/restores without having to grant them database creation privileges. If you want to include the create database command, you can use --create flag to pg_dump to make it so.

WRT your statement that it couldn't reload the dump file that was created, that does seem odd, though I can think of some cases where you might run into problems. Without further information it's hard to give any explination, but should you find yourself in that siutuation again, please send an email to the pgsql mailing lists with details.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.