[wplug] Large Database

George Larson george.g.larson at gmail.com
Fri Mar 6 10:19:48 EST 2009


I don't mean to 'jack the thread but would you mind to clarify why you chose
PostgreSQL over MySQL?  We use MySql at work, so I'm just curious....



On Fri, Mar 6, 2009 at 9:56 AM, Philip Reiche <Philip.Reiche at pl.netl.doe.gov
> wrote:

> How big and complex are the records, and how fast are you reading them? Can
> you load them into some other data structure say, a sparse array, during
> data aq and then process the recorded data offline?
>
> >>> "Bill Moran" <wmoran at potentialtech.com> 3/6/2009 9:37 AM >>>
>
> In response to DK <wplug at curlynoodle.com>:
> >
> > I need to implement a data acquisition system which will sample and
> > store large amounts of "time-series" data, that is hundred of millions
> > of records.  I would like to investigate using an open source
> > database.  Does anyone have suggestions?
>
> PostgreSQL is frequently used for datasets that size.
>
> However, whatever you choose, you're going to have to take some time to
> tune your methodology if you're dealing with that much data.  You're
> crossing a line between what ordinary people expect computers to do and
> what _REAL_ number crunching is like.
>
> You will _NEVER_ get the kind of performance out of a database that size
> that the average user gets out of his MS Access database that he uses to
> catalog his library of mp3s.  You will have queries that take hours to
> run.  This will be the case regardless of what DB system you use ... pay
> $infinity for Oracle and the performance won't be any better.  It's just
> the realities of scale.
>
> You will probably also find that collecting said data will require various
> forms of "trickery".  If you just try to insert it one row at a time,
> you'll
> spend weeks getting it into the database.  If you're trying to capture it
> "real time", you'll probably find that it's impossible to do.  There are
> LOTS of tricks to getting this to work, but the standard methods of
> data processing usually don't scale up to recordsets that size.  Again,
> it doesn't matter what DB you use, you'll come across these same problems.
> Keep in mind, that depending on the size of each row, you're talking about
> 10s of G of data ... possibly 100s of G.
>
> With that warning in mind, I further recommend PostgreSQL.  I've talked to
> many people dealing with datasets that size, and PG handles them as well
> as anything you'll pay money for.  Personally, I have a few databases with
> 10s of millions of rows and have had no problems with PG's performance, but
> I've yet to push a database into the 100s of millions.  I'll be interested
> to see how it works for your project.
>
> --
> Bill Moran
> http://www.potentialtech.com
> http://people.collaborativefusion.com/~wmoran/<http://people.collaborativefusion.com/%7Ewmoran/>
> _______________________________________________
> wplug mailing list
> wplug at wplug.org
> http://www.wplug.org/mailman/listinfo/wplug
>
>
> _______________________________________________
> wplug mailing list
> wplug at wplug.org
> http://www.wplug.org/mailman/listinfo/wplug
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.wplug.org/pipermail/wplug/attachments/20090306/1a54ab96/attachment.html 


More information about the wplug mailing list