[wplug] Starting From Scratch

Rob Lines rlinesseagate at gmail.com
Thu Apr 26 09:33:07 EDT 2007


On 4/25/07, Michael H. Semcheski <mhsemcheski at gmail.com> wrote:
>
> I didn't know LTSP could run some apps on the server and some on the
> client.  That's a pretty good feature.
>
> There's going to be a lot of matlab, and a significant (though not
> mountainous) amount of data being collected.  I feel kind of like, if we're
> going to have thick clients, than they might as well be proper thick
> clients.  I'm leaning toward using autofs and nfs.  In for a penny, in for a
> pound.
>
> The database with replication services is a cool idea too.  Its pretty
> easy to convert any thing into a stream and store it in a blob field.  One
> thing, though, is in my experience, some databases have trouble replicating
> blobs.
>
> Lots of great ideas in this thread.
>
> Mike
>
> On 4/25/07, n schembr <nschembr at yahoo.com> wrote:
> >
> > I've used ltsp in a past life. Ltsp  works well.
> >
> > ltsp has the option to run applications local. You can pick which
> > applications are run on the server vs the client.
> >
> > rsync can be used to replicate your data.
> >
> > If you have a lot of data, I would consider a database with replication
> > services.  Make each client part of the database cluster.
> >
> >
> > Nicholas A. Schembri
> > State College, PA, USA
> >
>
If matlab is one of your key programs one thing to look at is their cluster
solution.  That way you could have all the machines in the group available
for running matlab code maximizing the use of the cpu cycles without needing
one 'big box' to do all the processing.  Also I know you use ubuntu at home
but check with your Matlab rep and see what they are willing to support.
You want to use a release that the commercial vendors will support or that
very expensive piece of software can quickly become a very expensive
coaster.  I have had good experience working with MatLab under CentOS 4.4.
(individual not the clustered version)

Also something that may be overlooked would be good documentation of the
whole building process.  When you decide on something write it down.  If you
decide to go with NFS write it down and write down why it was a good choice
and if you looked at other options write down why you didn't go with them.
Decide where things are going to be installed and then when you install them
where they really are.  Make a change log that way they day to day admin has
something to see what is going on and if they keep it updated you will have
something to refer back to when trying to fix it.  A method that has worked
well for me recently is using a simple wiki.  It doesn't take much in the
way of resources and it is pretty easy to access when you are making the
changes.  A log book can work just as well though as long as the information
is kept up to day and available for the people maintaining the system.

The last thing I can say is while saving money is a good thing and we all
want to do it be careful skimping on the long term technology.  Things like
the network switches should be there longer than 2 or 3 life cycles of
computers.  Right now the cost for some decent unmanaged gigabit switches is
hardly more than the cost of a 100mb switch.  Yes you may save 25% by
purchasing the 100mb switch but every time that a file transfer takes 2 or 3
times as long and the users are sitting there waiting or you are running
code that ends up i/o starved because of a slow network connection that $100
will seem like a drop in the bucket.  Similarly with the file server if you
decide it is needed (It is almost always a good idea) try not to go cheap
and just use a desktop with cheap hardware.  Buy a real server with some
redundancy for power and drive storage and don't discount a UPS.  Also have
a backup strategy.  If there is the possibility to put a machine in another
building/location but on the same network you can't beat setting up an rsync
script to mirror the data on the server with a second machine.  You can
always sync it the first time with it in the same room and on the same
switch then move it out of the building and do daily syncs or how ever often
you feel is safe.  For that near-line backup it can be good to buy some of
the consumer drives in very large capacity 750 gb or 1tb because they will
not be stressed all the time only once a day.  This isn't a full solution a
portable drive or tape should be used to take real weekly backups for off
site archival.  DR and backup can be expensive but it is even more expensive
to lose 3 years of scientist's work and they have to start over.  ~6
scientists x 3 years x ~75k a year = $1.35m or $5k for an extra machine some
large drives and a few external media to rotate.  Even if you don't use Iron
Mountain or another off site company find a way to get the removable media
really off site.

Get spares of anything that can cause a work stoppage.  A few extra patch
cables, a replacement gbic (if you need them), extra hard drives of each
size you use, and label them all. If a drive is for the server and it is hot
swappable put a note on it if they are supposed to support them selves.

Once you get the machines set up I encourage you to use something like
ghost4linux to take a backup of each machine.  That way you can restore the
machine to a good working state in 20 or 30 minutes instead of having to
rebuild it by hand and hope you get it right. The image files can get a
little big but if you throw and extra 750 drive in each of the two servers
you can store them and mirror them at a fairly low cost (compared to the
cost of your time)

The last one is make a plan and try to define from the get go who is going
to do what and once it is up and running who is responsible for it.  I
couldn't make out if you are just helping out or if you work for IT but it
is very important that you know who should be doing what.  If it isn't part
of your job and it just something you are doing to help out then your boss
would probably get upset if you had to go there every time someone couldn't
remember where they saved something.  If you spell it out at the beginning
it will cut down on the problems later when you don't have time to help out.

Best of luck to you on this project.  It should be a lot of fun to build
something from the ground up with all the new technology.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.wplug.org/pipermail/wplug/attachments/20070426/eb345696/attachment.html


More information about the wplug mailing list