[wplug] Large Database

Bill Moran wmoran at potentialtech.com
Fri Mar 6 10:58:37 EST 2009


In response to George Larson <george.g.larson at gmail.com>:

> I don't mean to 'jack the thread but would you mind to clarify why you chose
> PostgreSQL over MySQL?  We use MySql at work, so I'm just curious....

Do you have over 100,000,000 rows in any table?

It's been a while since I abandoned MySQL for various reasons, but a few
years ago, putting over 100,000,000 rows in a MySQL database would basically
kill the thing.  Granted, the MySQL programmers are constantly improving,
and there have been many releases, so that performance issue may be gone.

I still have good cause to choose PostgreSQL over MySQL, though.  And that
is that the PostgreSQL community is full of people who have been using
PostgreSQL for databases that larger for many, many years.  Which means
joining the PostgreSQL mailing list and asking for help if you hit big
database problems is going to reward you with hundreds of man-years of
experience in making PostgreSQL handle large databases.

> On Fri, Mar 6, 2009 at 9:56 AM, Philip Reiche <Philip.Reiche at pl.netl.doe.gov
> > wrote:
> 
> > How big and complex are the records, and how fast are you reading them? Can
> > you load them into some other data structure say, a sparse array, during
> > data aq and then process the recorded data offline?
> >
> > >>> "Bill Moran" <wmoran at potentialtech.com> 3/6/2009 9:37 AM >>>
> >
> > In response to DK <wplug at curlynoodle.com>:
> > >
> > > I need to implement a data acquisition system which will sample and
> > > store large amounts of "time-series" data, that is hundred of millions
> > > of records.  I would like to investigate using an open source
> > > database.  Does anyone have suggestions?
> >
> > PostgreSQL is frequently used for datasets that size.
> >
> > However, whatever you choose, you're going to have to take some time to
> > tune your methodology if you're dealing with that much data.  You're
> > crossing a line between what ordinary people expect computers to do and
> > what _REAL_ number crunching is like.
> >
> > You will _NEVER_ get the kind of performance out of a database that size
> > that the average user gets out of his MS Access database that he uses to
> > catalog his library of mp3s.  You will have queries that take hours to
> > run.  This will be the case regardless of what DB system you use ... pay
> > $infinity for Oracle and the performance won't be any better.  It's just
> > the realities of scale.
> >
> > You will probably also find that collecting said data will require various
> > forms of "trickery".  If you just try to insert it one row at a time,
> > you'll
> > spend weeks getting it into the database.  If you're trying to capture it
> > "real time", you'll probably find that it's impossible to do.  There are
> > LOTS of tricks to getting this to work, but the standard methods of
> > data processing usually don't scale up to recordsets that size.  Again,
> > it doesn't matter what DB you use, you'll come across these same problems.
> > Keep in mind, that depending on the size of each row, you're talking about
> > 10s of G of data ... possibly 100s of G.
> >
> > With that warning in mind, I further recommend PostgreSQL.  I've talked to
> > many people dealing with datasets that size, and PG handles them as well
> > as anything you'll pay money for.  Personally, I have a few databases with
> > 10s of millions of rows and have had no problems with PG's performance, but
> > I've yet to push a database into the 100s of millions.  I'll be interested
> > to see how it works for your project.
> >
> > --
> > Bill Moran
> > http://www.potentialtech.com
> > http://people.collaborativefusion.com/~wmoran/<http://people.collaborativefusion.com/%7Ewmoran/>
> > _______________________________________________
> > wplug mailing list
> > wplug at wplug.org
> > http://www.wplug.org/mailman/listinfo/wplug
> >
> >
> > _______________________________________________
> > wplug mailing list
> > wplug at wplug.org
> > http://www.wplug.org/mailman/listinfo/wplug
> >
> >
> 


-- 
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/


More information about the wplug mailing list