I don't mean to 'jack the thread but would you mind to clarify why you chose PostgreSQL over MySQL? We use MySql at work, so I'm just curious....<br><br><br><br><div class="gmail_quote">On Fri, Mar 6, 2009 at 9:56 AM, Philip Reiche <span dir="ltr"><<a href="mailto:Philip.Reiche@pl.netl.doe.gov">Philip.Reiche@pl.netl.doe.gov</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div style="margin: 4px 4px 1px; font-family: Tahoma; font-style: normal; font-variant: normal; font-weight: normal; font-size: 10pt; line-height: normal; font-size-adjust: none; font-stretch: normal;">How big and complex are the records, and how fast are you reading them? Can you load them into some other data structure say, a sparse array, during data aq and then process the recorded data offline?<br>
<br>>>> "Bill Moran" <<a href="mailto:wmoran@potentialtech.com" target="_blank">wmoran@potentialtech.com</a>> 3/6/2009 9:37 AM >>><div><div></div><div class="h5"><br>In response to DK <<a href="mailto:wplug@curlynoodle.com" target="_blank">wplug@curlynoodle.com</a>>:<br>
> <br>> I need to implement a data acquisition system which will sample and<br>> store large amounts of "time-series" data, that is hundred of millions<br>> of records. I would like to investigate using an open source<br>
> database. Does anyone have suggestions?<br><br>PostgreSQL is frequently used for datasets that size.<br><br>However, whatever you choose, you're going to have to take some time to<br>tune your methodology if you're dealing with that much data. You're<br>
crossing a line between what ordinary people expect computers to do and<br>what _REAL_ number crunching is like.<br><br>You will _NEVER_ get the kind of performance out of a database that size<br>that the average user gets out of his MS Access database that he uses to<br>
catalog his library of mp3s. You will have queries that take hours to<br>run. This will be the case regardless of what DB system you use ... pay<br>$infinity for Oracle and the performance won't be any better. It's just<br>
the realities of scale.<br><br>You will probably also find that collecting said data will require various<br>forms of "trickery". If you just try to insert it one row at a time, you'll<br>spend weeks getting it into the database. If you're trying to capture it<br>
"real time", you'll probably find that it's impossible to do. There are<br>LOTS of tricks to getting this to work, but the standard methods of<br>data processing usually don't scale up to recordsets that size. Again,<br>
it doesn't matter what DB you use, you'll come across these same problems.<br>Keep in mind, that depending on the size of each row, you're talking about<br>10s of G of data ... possibly 100s of G.<br><br>With that warning in mind, I further recommend PostgreSQL. I've talked to<br>
many people dealing with datasets that size, and PG handles them as well<br>as anything you'll pay money for. Personally, I have a few databases with<br>10s of millions of rows and have had no problems with PG's performance, but<br>
I've yet to push a database into the 100s of millions. I'll be interested<br>to see how it works for your project.<br><br>-- <br>Bill Moran<br><a href="http://www.potentialtech.com" target="_blank">http://www.potentialtech.com</a><br>
<a href="http://people.collaborativefusion.com/%7Ewmoran/" target="_blank">http://people.collaborativefusion.com/~wmoran/</a><br>_______________________________________________<br>wplug mailing list<br><a href="mailto:wplug@wplug.org" target="_blank">wplug@wplug.org</a><br>
<a href="http://www.wplug.org/mailman/listinfo/wplug" target="_blank">http://www.wplug.org/mailman/listinfo/wplug</a><br><br></div></div></div>
<br>_______________________________________________<br>
wplug mailing list<br>
<a href="mailto:wplug@wplug.org">wplug@wplug.org</a><br>
<a href="http://www.wplug.org/mailman/listinfo/wplug" target="_blank">http://www.wplug.org/mailman/listinfo/wplug</a><br>
<br></blockquote></div><br>