← Back to team overview

modred team mailing list archive

Re: Fwd: Concept stuff

 

Man, that sounds pretty complicated. How big a cluster are we trying to
build, now?

On Mon, Dec 28, 2009 at 10:54 PM, Scott Lawrence <bytbox@xxxxxxxxx> wrote:

> I like this. And actually, the cluster could use the designated values
> to estimate how long tasks will take.
>
>
> Mikey: can you go to launchpad and make a blueprint for this?
>
> On 12/28/09, Michael Cohen <gnurdux@xxxxxxxxx> wrote:
> > The idea here is that if people are smart, stuff should basically
> > acquire a "market value."  Credits would be basically like money.  The
> > only issue is how credits originate, since someone needs to get them
> > before people spend them.  But you could probably make some "official"
> > projects that you will automatically get credits at a certain rate for
> > participating in.
> >
> > Michael Cohen
> >
> > Scott Lawrence wrote:
> >> Oh, that's interesting.  It might actually work... Other comments?
> >>
> >>
> >> On 12/28/09, Michael Cohen <gnurdux@xxxxxxxxx> wrote:
> >>> Wrong.  You need to have credits to pay people with.  And so you have
> to
> >>> spend more credits if you want to pay people more.
> >>>
> >>> Michael Cohen
> >>>
> >>> Scott Lawrence wrote:
> >>>> No.  That leads to people trying to make their projects as valuable as
> >>>> possible, and within 3 days, the whole system will be worthless.
> >>>>
> >>>> Please remember that the mailing lists at launchpad don't perform
> >>>> reply-to mangling. Instruct your client accordingly.
> >>>>
> >>>> On 12/28/09, Michael Cohen <gnurdux@xxxxxxxxx> wrote:
> >>>>> I would actually award credits on a per-project basis.  Each project
> >>>>> simply chooses how many credits to award for each task.  Your
> computer
> >>>>> is offered so many credits for so much work; if it finishes it gets
> the
> >>>>> creds and otherwise not.  Trying to do it based on time is dumb
> because
> >>>>> we don't care about time, we care about computational power
> >>>>> contributed.
> >>>>>   If someone gives me 5000 hours, but in a VM limited to run at .1%
> of
> >>>>> their Pentium 3 CPU, then that isn't worth as much to me as 5 hours
> on
> >>>>> someone's brand new quad core powerhouse.
> >>>>>
> >>>>> Michael Cohen
> >>>>>
> >>>>> Scott Lawrence wrote:
> >>>>>> Some administrators will consider CPU time credits to be very
> >>>>>> important, though. Especially if they want them to be
> >>>>>> buyable/exchangable for that stink green stuff.
> >>>>>>
> >>>>>> Let's deal with this later - we could just validate the same way we
> do
> >>>>>> right answer/wrong answer validation.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 12/28/09, Frederic Koehler <fkfire@xxxxxxxxx> wrote:
> >>>>>>> ---------- Forwarded message ----------
> >>>>>>> From: Frederic Koehler <fkfire@xxxxxxxxx>
> >>>>>>> Date: Mon, Dec 28, 2009 at 10:17 PM
> >>>>>>> Subject: Re: [Modred] Concept stuff
> >>>>>>> To: Scott Lawrence <bytbox@xxxxxxxxx>
> >>>>>>>
> >>>>>>>
> >>>>>>> Network capacity can be protected by batching responses. Normal
> >>>>>>> clients
> >>>>>>> can
> >>>>>>> save several computations and send one big response (like when
> >>>>>>> exiting)
> >>>>>>> -
> >>>>>>> this will have similar bandwidth to big computations but avoid the
> >>>>>>> potential
> >>>>>>> for a really long computation wasting time on a bunch of computers
> >>>>>>> without
> >>>>>>> being solved.
> >>>>>>>
> >>>>>>>
> >>>>>>> On Mon, Dec 28, 2009 at 10:14 PM, Scott Lawrence <bytbox@xxxxxxxxx
> >
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> * Maybe. But that still makes it too easy to gain false credits.
> >>>>>>>> * Yeah, I'm agreeing with this.  The hub delegates to the servers,
> >>>>>>>> and
> >>>>>>>> the servers delegate to clients.
> >>>>>>>> * This will overload network capacity, and I don't like it.  We
> >>>>>>>> should
> >>>>>>>> be able to trust clients to make long computations (long=over 30
> >>>>>>>> seconds).  Maybe clients could give servers hints on how long
> >>>>>>>> they'll
> >>>>>>>> be on?
> >>>>>>>>
> >>>>>>>> On 12/28/09, Frederic Koehler <fkfire@xxxxxxxxx> wrote:
> >>>>>>>>> Hah, my email died, wow...Wonder what happened to it....
> >>>>>>>>> Anyway, this is something like what I wrote before:
> >>>>>>>>>
> >>>>>>>>> * CPU time credits can be very roughly estimated by averaging
> >>>>>>>>> response
> >>>>>>>> time.
> >>>>>>>>> It's not all _that_ important anyway if nothing is a behemoth
> task.
> >>>>>>>>>
> >>>>>>>>> * The hub can immediately, upon establishing connection, redirect
> >>>>>>>>> client
> >>>>>>>> to
> >>>>>>>>> a server. The server will still have to communicate with hub
> >>>>>>>>> somewhat,
> >>>>>>>> but
> >>>>>>>>> it can only send stuff necessarily pertaining to the hub.
> >>>>>>>>>
> >>>>>>>>> * Jobs should probably be many small tasks to avoid the risk of
> >>>>>>>>> losing
> >>>>>>>>> a
> >>>>>>>>> giant computation (since saving computation state is not
> >>>>>>>>> easy/generalizable). Beyond that, sending keep-alive packets is
> >>>>>>>>> enough
> >>>>>>>>> to
> >>>>>>>>> know when a client dies.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Mon, Dec 28, 2009 at 6:16 PM, Scott Lawrence <
> bytbox@xxxxxxxxx>
> >>>>>>>> wrote:
> >>>>>>>>>> What?
> >>>>>>>>>>
> >>>>>>>>>> On 12/28/09, Frederic Koehler <fkfire@xxxxxxxxx> wrote:
> >>>>>>>>>>> On Mon, Dec 28, 2009 at 12:45 AM, Scott Lawrence
> >>>>>>>>>>> <bytbox@xxxxxxxxx>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>> "What happens when a client disconnects with unfinished work?
> Is
> >>>>>>>>>>>> the
> >>>>>>>>>>>> work immeditately reassigned, or does the server wait for a
> >>>>>>>>>>>> specified
> >>>>>>>>>>>> period, etc. This could come up quite a lot because some
> clients
> >>>>>>>>>>>> will
> >>>>>>>>>>>> just disconnect as soon as work they submitted is completed."
> >>>>>>>>>>>>
> >>>>>>>>>>>> Ouch.  Good question.  Here's one solution: small tasks
> >>>>>>>>>>>> (expected
> >>>>>>>> time
> >>>>>>>>>>>> <2 seconds) are always assigned to two or more
> clients/servers.
> >>>>>>>>>>>> If
> >>>>>>>>>>>> both disconnect, reassign, if one disconnects, use the other
> >>>>>>>>>>>> guy's
> >>>>>>>>>>>> answer. Large tasks, if a computer stops regularly checking in
> >>>>>>>>>>>> every
> >>>>>>>> 5
> >>>>>>>>>>>> or so seconds, give that computer's results to date to another
> >>>>>>>>>>>> computer.  So yeah, I think a client should have to make
> regular
> >>>>>>>>>>>> reports to a server.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Here's another problem: how do we tell how many CPU time
> credits
> >>>>>>>>>>>> to
> >>>>>>>>>>>> grant a client?  We can't always tell how long a problem
> should
> >>>>>>>>>>>> take
> >>>>>>>>>>>> beforehand.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Here's another problem: which computer should handle the
> >>>>>>>>>>>> clients?
> >>>>>>>>>>>> As
> >>>>>>>>>>>> I've been thinking about this, there are three types of
> >>>>>>>>>>>> computers,
> >>>>>>>> the
> >>>>>>>>>>>> single hub, the various dedicated servers (capable of storing
> >>>>>>>>>>>> permanent data), and the clients.  (The hub is necessary -
> >>>>>>>>>>>> without
> >>>>>>>> it,
> >>>>>>>>>>>> the performance of the cluster drastically decreases.) So
> >>>>>>>>>>>> clients
> >>>>>>>>>>>> connect to the hub, and then the hub directs all computers.
>  But
> >>>>>>>>>>>> the
> >>>>>>>>>>>> hub will get overloaded if 100 computers are checking in every
> >>>>>>>>>>>> 10
> >>>>>>>>>>>> seconds to give it more data (and then the hub has to pass
> this
> >>>>>>>>>>>> on
> >>>>>>>>>>>> to
> >>>>>>>>>>>> other servers for storage, etc...).  So at some point, the hub
> >>>>>>>>>>>> needs
> >>>>>>>>>>>> to tell the client to talk to the server.  When?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Who wants to create that prototype?
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 12/28/09, Scott Lawrence <bytbox@xxxxxxxxx> wrote:
> >>>>>>>>>>>>> This is where we start building prototypes. However, just to
> >>>>>>>>>>>>> keep
> >>>>>>>> the
> >>>>>>>>>>>>> theoretical side going: I disagree about the privacy issue.
> >>>>>>>>>>>>> Most
> >>>>>>>>>>>>> operations that would benefit from the CPU time of a cluster
> >>>>>>>> (notice
> >>>>>>>>>>>>> I'm not talking about the data storage and reliability
> >>>>>>>>>>>>> benefits,
> >>>>>>>>>>>>> which
> >>>>>>>>>>>>> aren't affected by the presence of clients) are not very
> >>>>>>>>>>>>> private.
> >>>>>>>>>>>>> Rendering nice screensavers ("Electric Sheep", I think that
> >>>>>>>>>>>>> one's
> >>>>>>>>>>>>> called), and hefty data sifting aren't private - who cares
> >>>>>>>>>>>>> about
> >>>>>>>> the
> >>>>>>>>>>>>> screensavers, and the data is generally public anyway (of
> >>>>>>>>>>>>> course
> >>>>>>>>>>>>> if
> >>>>>>>>>>>>> it
> >>>>>>>>>>>>> wasn't, it would be marked so).
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Ray tracing and simulation could be more of an issue.
> >>>>>>>>>>>>> Hypothetical
> >>>>>>>>>>>>> situation: Alice is simulating how wind will affect her
> >>>>>>>>>>>>> proprietary
> >>>>>>>>>>>>> airplane design.  Naturally, she can't hand off the whole
> >>>>>>>>>>>>> design,
> >>>>>>>> or
> >>>>>>>>>>>>> even parts of the design, to random client computers.  This
> is
> >>>>>>>> where
> >>>>>>>>>>>>> the windows programmer says, "so the client computers can't
> >>>>>>>>>>>>> help
> >>>>>>>>>>>>> Alice."  But that's not true - as a bad example, what if the
> >>>>>>>>>>>>> Modred
> >>>>>>>>>>>>> hub gave to a client computer 80 types of landing gear, and
> >>>>>>>>>>>>> told
> >>>>>>>> the
> >>>>>>>>>>>>> computer, not to simulate something, but to solve a general
> >>>>>>>>>>>>> formula
> >>>>>>>>>>>>> that could later be used in the computation in a trivial and
> >>>>>>>>>>>>> quick
> >>>>>>>>>>>>> way?  If that client is evil, it will learn that Alice's
> >>>>>>>>>>>>> airplane
> >>>>>>>> has
> >>>>>>>>>>>>> some sort of landing gear.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Somebody needs to create a prototype of a server that can
> >>>>>>>>>>>>> create
> >>>>>>>>>>>>> arbitrary problems in some format, so we can all try to trick
> >>>>>>>>>>>>> it.
> >>>>>>>>  I
> >>>>>>>>>>>>> suggest lisp as the language, but it's up to the implementer.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On 12/28/09, Scott Lawrence <bytbox@xxxxxxxxx> wrote:
> >>>>>>>>>>>>>> ---------- Forwarded message ----------
> >>>>>>>>>>>>>> From: Frederic Koehler <fkfire@xxxxxxxxx>
> >>>>>>>>>>>>>> Date: Sun, 27 Dec 2009 23:21:28 -0500
> >>>>>>>>>>>>>> Subject: Re: [Modred] Concept stuff
> >>>>>>>>>>>>>> To: Scott Lawrence <bytbox@xxxxxxxxx>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>  * This is sort-of a solution (while obviously
> >>>>>>>>>>>>>> less-than-optimal
> >>>>>>>>>>>>>> security,
> >>>>>>>>>>>>>> some grid-computing stuff does this, like BOINC), however,
> it
> >>>>>>>> turns
> >>>>>>>>>> out
> >>>>>>>>>>>>>> that
> >>>>>>>>>>>>>> this may require custom validation methods - for example,
> it's
> >>>>>>>>>>>>>> normal
> >>>>>>>>>>>> for
> >>>>>>>>>>>>>> floating point values to be different on different
> computers,
> >>>>>>>>>>>>>> and
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>> same
> >>>>>>>>>>>>>> could apply for other computations.
> >>>>>>>>>>>>>>  * A malicious client would only need to misbehave on
> certain
> >>>>>>>>>> problems
> >>>>>>>>>>>>>> that
> >>>>>>>>>>>>>> a malicious user could designate (or recognize obvious fake
> >>>>>>>>>> programs),
> >>>>>>>>>>>>>> allowing the fake program test to work.
> >>>>>>>>>>>>>>     - A better idea would be to randomly reduplicate some
> >>>>>>>>>> computations
> >>>>>>>>>>>>>> many
> >>>>>>>>>>>>>> times - the malicious client wouldn't notice anything, but
> >>>>>>>>>>>>>> could
> >>>>>>>>>> easily
> >>>>>>>>>>>>>> be
> >>>>>>>>>>>>>> singled out
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>   * Thirdly is mostly the same thing I wrote before - only
> >>>>>>>>>> computations
> >>>>>>>>>>>>>> that
> >>>>>>>>>>>>>> are said to be totally unimportant privacy wise could
> benefit
> >>>>>>>>>>>>>> from
> >>>>>>>>>>>>>> client-side computing.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> So I really think that client-side computation is only a
> good
> >>>>>>>>>>>>>> idea
> >>>>>>>>>> for
> >>>>>>>>>>>>>> a
> >>>>>>>>>>>>>> small subset of problems (like the type that there already
> >>>>>>>>>>>>>> exist
> >>>>>>>>>>>>>> massive
> >>>>>>>>>>>>>> grid computing solutions for, like SETI@HOME)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Sun, Dec 27, 2009 at 10:57 PM, Scott Lawrence <
> >>>>>>>> bytbox@xxxxxxxxx>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I want clients to be used for computation, and I want
> maximum
> >>>>>>>>>>>>>>> privacy+security given that restriction.  Some ideas:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> With a large network, two computers can perform the same
> >>>>>>>>>> computation.
> >>>>>>>>>>>>>>> Furthermore, a smart modred hub can give fake problems to
> >>>>>>>> clients,
> >>>>>>>>>>>>>>> just to make sure that they're operating correctly.  A
> client
> >>>>>>>> that
> >>>>>>>>>>>>>>> isn't operating correctly gets cut. (No second chances! A
> >>>>>>>>>>>>>>> program
> >>>>>>>>>>>>>>> could exploit that!)
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> If a user specifies a certain bit of data (SSN, for
> instance)
> >>>>>>>>>>>>>>> as
> >>>>>>>>>>>>>>> highly sensitive, modred should know not to hand off that
> >>>>>>>>>> computation
> >>>>>>>>>>>>>>> to a client. (If it does by accident, it certainly should
> >>>>>>>>>>>>>>> never
> >>>>>>>>>>>>>>> hand
> >>>>>>>>>>>>>>> off the data.) privacy++
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> In all cases, computations should be anonymous. privacy++
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Other ideas?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On 12/27/09, Frederic Koehler <fkfire@xxxxxxxxx> wrote:
> >>>>>>>>>>>>>>>> The idea for client-side computation implies that we have
> >>>>>>>>>>>>>>>> highly-trusted
> >>>>>>>>>>>>>>>> clients... (we know they won't provide invalid answers)
> >>>>>>>>>>>>>>>> Otherwise,
> >>>>>>>>>>>>>>>> client-side computation requires verifying answers and so
> is
> >>>>>>>> only
> >>>>>>>>>>>>>>>> useful
> >>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>> a few NP-ish problems. In addition (assuming trusted
> >>>>>>>>>>>>>>>> clients).
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Also, it means that, since computations can contain
> >>>>>>>>>>>>>>>> sensitive
> >>>>>>>>>> data,
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>> abillity to spread the computation is limited - unless we
> >>>>>>>>>>>>>>>> know
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>> computation is not user-sensitive, it can only try to use
> >>>>>>>>>>>>>>>> the
> >>>>>>>>>> user's
> >>>>>>>>>>>>>>>> client(s). This way we also know that the client has no
> >>>>>>>> interest
> >>>>>>>>>> in
> >>>>>>>>>>>>>>>> sabatoging answers to mess with other users (except to
> >>>>>>>>>>>>>>>> exploit
> >>>>>>>>>>>>>>> server-side
> >>>>>>>>>>>>>>>> weaknesses, which is inevitable).
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Sat, Dec 26, 2009 at 10:28 PM, Scott Lawrence <
> >>>>>>>>>> bytbox@xxxxxxxxx>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>> Here is what, as I envision it, will make modred unique
> >>>>>>>>>>>>>>>>> (and
> >>>>>>>>>> hard):
> >>>>>>>>>>>>>>>>>  * Support for clients who can come and leave, lending
> CPU
> >>>>>>>> time
> >>>>>>>>>> and
> >>>>>>>>>>>>>>>>> using CPU time as they choose.  There are some clusters
> >>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>> support
> >>>>>>>>>>>>>>>>> this, but not very many.
> >>>>>>>>>>>>>>>>>  * Support for computers participating across the
> internet.
> >>>>>>>>>>>>>>>>> This
> >>>>>>>>>>>>>>>>> goes
> >>>>>>>>>>>>>>>>> along with the previous part, but remember we need
> security
> >>>>>>>>>>>>>>>>> to
> >>>>>>>>>> make
> >>>>>>>>>>>>>>>>> this worth anything. This also means that user data could
> >>>>>>>>>>>> potentially
> >>>>>>>>>>>>>>>>> be passed to untrusted computers - we need a way to
> prevent
> >>>>>>>>>>>>>>>>> this.
> >>>>>>>>>>>>>>>>>  * The ability for clients to run on any OS, using perl,
> >>>>>>>> python,
> >>>>>>>>>>>>>>>>> java,
> >>>>>>>>>>>>>>>>> or (on unix systems) C and C++ (servers and the hub will
> >>>>>>>>>>>>>>>>> need
> >>>>>>>> to
> >>>>>>>>>>>>>>>>> run
> >>>>>>>>>>>>>>>>> on linux or at least another unix, or a dedicated OS
> which
> >>>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>> may
> >>>>>>>>>>>>>>>>> decide to write)
> >>>>>>>>>>>>>>>>>  * Modred has great ease of use because it acts as a
> single
> >>>>>>>>>> unified
> >>>>>>>>>>>>>>>>> computer - a special client program exists that allows
> one
> >>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>> log
> >>>>>>>>>>>> in,
> >>>>>>>>>>>>>>>>> access and edit files, etc...  This is very close to
> unique
> >>>>>>>>>>>>>>>>> -
> >>>>>>>>>>>>>>>>> google
> >>>>>>>>>>>>>>>>> has it, though
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Because of that last point, many OS design issues should
> >>>>>>>>>>>>>>>>> come
> >>>>>>>> up
> >>>>>>>>>>>> when
> >>>>>>>>>>>>>>>>> we code modred. (I think Freddy pointed this out?) Thus,
> we
> >>>>>>>> have
> >>>>>>>>>> a
> >>>>>>>>>>>>>>>>> chance to fix flaws in standard unix, incorporating plan
> >>>>>>>> 9-type
> >>>>>>>>>>>> stuff
> >>>>>>>>>>>>>>>>> (google it and read about it - Plan 9 from Bell Labs, the
> >>>>>>>>>>>>>>>>> way
> >>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> future of unix was) while also creating an actually
> usable
> >>>>>>>> user
> >>>>>>>>>>>>>>>>> interface. (No offense, but to a newbie
> non-super-technical
> >>>>>>>>>>>>>>>>> user,
> >>>>>>>>>>>>>>>>> linux is a bit harsh...)
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Some implementation questions and ideas:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>  - how will updates be handled?  Remember we've got 200
> >>>>>>>>>>>>>>>>> computers
> >>>>>>>>>>>>>>>>> potentially, some of which might be clients that want to
> >>>>>>>>>>>>>>>>> participate
> >>>>>>>>>>>>>>>>> in multiple clusters.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>  - maybe we should have programs not include front ends.
> >>>>>>>>>>  Instead,
> >>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> modred software creates a front-end from the program's
> self
> >>>>>>>>>>>>>>>>> description.  This would enforce a consistent user
> >>>>>>>>>>>>>>>>> interface
> >>>>>>>> if
> >>>>>>>>>> we
> >>>>>>>>>>>>>>>>> could implement it well
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>  - how can we keep users from being able to snoop on each
> >>>>>>>>>>>>>>>>> others'
> >>>>>>>>>>>>>>>>> data?
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> That's just a sample to get people thinking.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On 12/26/09, David Tolnay <dtolnay@xxxxxxxxx> wrote:
> >>>>>>>>>>>>>>>>>> Before diving in to specifics about the implementation I
> >>>>>>>> think
> >>>>>>>>>> we
> >>>>>>>>>>>>>>>>>> need
> >>>>>>>>>>>>>>>>>> to decide how we want modred to be different from (read:
> >>>>>>>>>>>>>>>>>> better
> >>>>>>>>>>>>>>>>>> than)
> >>>>>>>>>>>>>>>>>> existing bootable cluster environments. Here is a short
> >>>>>>>>>>>>>>>>>> list
> >>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>> check
> >>>>>>>>>>>>>>>>>> out:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Bootable Cluster CD (http://bccd.net/) - folks
> presented
> >>>>>>>> this
> >>>>>>>>>> at
> >>>>>>>>>>>>>>>>>> SC09
> >>>>>>>>>>>>>>>>>> in portland, it was pretty neat stuff. Packed with
> >>>>>>>>>>>>>>>>>> education
> >>>>>>>> /
> >>>>>>>>>>>>>>>>>> debugging / visualization features
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Oscar (http://svn.oscar.openclustergroup.org/trac/oscar)
> -
> >>>>>>>>>> very
> >>>>>>>>>>>>>>>>>> trivially simple way to transform an existing unix lab
> >>>>>>>>>>>>>>>>>> into
> >>>>>>>> a
> >>>>>>>>>>>>>>>>>> cluster
> >>>>>>>>>>>>>>>>>> resource
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Lnx-bbc (http://www.lnx-bbc.com/) - includes cowsay!
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Perceus/warewulf (http://www.perceus.org/portal/) - a
> lot
> >>>>>>>> of
> >>>>>>>>>>>> other
> >>>>>>>>>>>>>>>>>> sites made reference to this, haven't read too much
> about
> >>>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> What specifically do you want to improve over any of
> >>>>>>>>>>>>>>>>>> these?
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On 12/25/09, Frederic Koehler <fkfire@xxxxxxxxx> wrote:
> >>>>>>>>>>>>>>>>>>> So, as far as I understand this project, the idea is to
> >>>>>>>> build
> >>>>>>>>>>>>>>>>>>> both a client library and a program using the library
> to
> >>>>>>>>>>>>>>>>>>> do
> >>>>>>>>>>>>>>> clustering
> >>>>>>>>>>>>>>>>>>> stuff, along with matching server/hub foo (the library
> >>>>>>>> might
> >>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> same
> >>>>>>>>>>>>>>>>>>> or
> >>>>>>>>>>>>>>>>>>> whatever, not important).
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> So from this understanding, it seems that the system
> >>>>>>>>>>>>>>>>>>> should
> >>>>>>>>>>>>>>>>>>> provide
> >>>>>>>>>>>>>>>>>>> some
> >>>>>>>>>>>>>>>>>>> basic pseudo-operating system stuff and programs can
> >>>>>>>>>>>>>>>>>>> build
> >>>>>>>> on
> >>>>>>>>>>>>>>>>>>> that,
> >>>>>>>>>>>>>>>>>>> just
> >>>>>>>>>>>>>>>>>>> like they would normally build on their local
> libc/kernel
> >>>>>>>> and
> >>>>>>>>>>>>>>>>>>> stuff.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> So (I sure like the word "so" today...) if we want the
> >>>>>>>>>>>>>>>>>>> type
> >>>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>> general
> >>>>>>>>>>>>>>>>>>> os-like stuff it seems their needs to be support for:
> >>>>>>>>>>>>>>>>>>>    * A simpe message passing model - abstract away all
> >>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> TCP-foo,
> >>>>>>>>>>>>>>>>>  maybe
> >>>>>>>>>>>>>>>>>>> use existing foo here (obviously needs fleshing out)
> >>>>>>>>>>>>>>>>>>>    * Permanent storage IO (clone the unix write(),
> >>>>>>>>>>>>>>>>>>> read(),
> >>>>>>>>>>>>>>>>>>> open()
> >>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>> sync()
> >>>>>>>>>>>>>>>>>>> model,  or maybe just use one of the existing
> >>>>>>>>>>>>>>>>>>> database-ish
> >>>>>>>>>> nosql
> >>>>>>>>>>>>>>> things
> >>>>>>>>>>>>>>>>>>> out
> >>>>>>>>>>>>>>>>>>> there)
> >>>>>>>>>>>>>>>>>>>            - Unix-ish model - you create your data
> hunk,
> >>>>>>>> say
> >>>>>>>>>> you
> >>>>>>>>>>>>>>>>>>> want
> >>>>>>>>>>>>>>>>> all
> >>>>>>>>>>>>>>>>>>> this stuff in it, then after sync() we know it's
> actually
> >>>>>>>>>>>>>>>>>>> somewhere
> >>>>>>>>>>>>>>>>>>> written
> >>>>>>>>>>>>>>>>>>> on a hard-drive, and other things can read it too
> >>>>>>>>>>>>>>>>>>>            - Unless this isn't in fact needed (but I
> >>>>>>>>>>>>>>>>>>> assume
> >>>>>>>>>>>>>>>>>>> it
> >>>>>>>>>>>> is)
> >>>>>>>>>>>>>>>>>>>            - Also need to figure out if it's
> >>>>>>>>>>>>>>>>>>> filesystem-ish
> >>>>>>>>>> foo
> >>>>>>>>>>>>>>>>>>> (hierarchial) we want or more relational database-ish
> >>>>>>>>>>>>>>>>>>> stuff
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>    * A task delegation model - some type of
> >>>>>>>>>>>>>>>>>>> map/reduce-ish
> >>>>>>>>>> stuff
> >>>>>>>>>>>>>>>>>>>           - Servers have a few built-in computations,
> and
> >>>>>>>>>> client
> >>>>>>>>>>>>>>>>> utilizes
> >>>>>>>>>>>>>>>>>>> them?
> >>>>>>>>>>>>>>>>>>>           - Or more complex, servers run sandboxed
> >>>>>>>>>> computational
> >>>>>>>>>>>>>>> code?
> >>>>>>>>>>>>>>>>>>>    * A security system?
> >>>>>>>>>>>>>>>>>>>         - Needs fleshing out
> >>>>>>>>>>>>>>>>>>>         - Presumably what the "hub" manages - it's the
> >>>>>>>>>>>>>>>>>>> trusted
> >>>>>>>>>>>>>>>>>>> thing
> >>>>>>>>>>>>>>>>>>>         - Obviously, not everybody is allowed to use
> the
> >>>>>>>>>> cluster
> >>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>> computation, not everybody can find out what everybody
> >>>>>>>>>>>>>>>>>>> else
> >>>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>> doing,
> >>>>>>>>>>>>>>>>> etc.
> >>>>>>>>>>>>>>>>>>>       - But also, is their a limit on storage, are some
> >>>>>>>>>>>>>>>>>>> things
> >>>>>>>>>>>>>>>>> prioritized
> >>>>>>>>>>>>>>>>>>> over others, ?
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Theroretically, server's are written to provide the io
> >>>>>>>>>>>>>>>>>>> backend
> >>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>> allow
> >>>>>>>>>>>>>>>>>>> for task delegation, clients use the api, although hub
> >>>>>>>>>>>>>>>>>>> has
> >>>>>>>>>> it's
> >>>>>>>>>>>>>>>>>>> work
> >>>>>>>>>>>>>>>>>>> cut
> >>>>>>>>>>>>>>>>>>> out
> >>>>>>>>>>>>>>>>>>> delegating all the file io and figuring out what the
> >>>>>>>>>>>>>>>>>>> state
> >>>>>>>> of
> >>>>>>>>>>>> that
> >>>>>>>>>>>>>>> is.
> >>>>>>>>>>>>>>>>>>> On top of some mixture of this, one could build a
> simple
> >>>>>>>>>>>>>>>>>>> unix-ish
> >>>>>>>>>>>>>>>>>>> pseudo-cli, theroretically, as well as real software.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Anyway, before actually doing anything, people should
> >>>>>>>>>>>>>>>>>>> read
> >>>>>>>>>> about
> >>>>>>>>>>>>>>>>>>> PVM
> >>>>>>>>>>>>>>>>>>> (Parallel Virtual Machine) and the like (maybe also
> >>>>>>>>>>>>>>>>>>> Hadoop
> >>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>> other
> >>>>>>>>>>>>>>>>>>> foo-ish
> >>>>>>>>>>>>>>>>>>> stuff) so Modred isn't just a bad clone of it
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Anyway, (yes, twice in a row!), I figured _someone_ had
> >>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>> respond
> >>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>> Scott,
> >>>>>>>>>>>>>>>>>>> otherwise he'd feel all lonely and sad :P Now he can
> have
> >>>>>>>>>>>>>>>>>>> a
> >>>>>>>>>> warm
> >>>>>>>>>>>>>>> fuzzy
> >>>>>>>>>>>>>>>>>>> feeling of deep confusion and uncertainty instead :P
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Fri, Dec 25, 2009 at 11:06 PM, Scott Lawrence
> >>>>>>>>>>>>>>>>>>> <bytbox@xxxxxxxxx>
> >>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>> ---------- Forwarded message ----------
> >>>>>>>>>>>>>>>>>>>> From: Scott Lawrence <bytbox@xxxxxxxxx>
> >>>>>>>>>>>>>>>>>>>> Date: Fri, 25 Dec 2009 19:20:13 -0500
> >>>>>>>>>>>>>>>>>>>> Subject: Design Overview
> >>>>>>>>>>>>>>>>>>>> To: modred <modred@xxxxxxxxxxxxxxxxxxx>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> I'm going to assume that everyone understands the
> basic
> >>>>>>>>>>>> concepts
> >>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>> modred: a set of networked computers (by 'networked' I
> >>>>>>>>>>>>>>>>>>>> mean,
> >>>>>>>>>>>>>>> they're
> >>>>>>>>>>>>>>>>>>>> all on the internet), divided for the sake of
> discussion
> >>>>>>>>>> into
> >>>>>>>>>>>>>>>>>>>> three
> >>>>>>>>>>>>>>>>>>>> classes: the 'hub' (the dude in charge, who compupters
> >>>>>>>> who
> >>>>>>>>>>>>>>>>>>>> want
> >>>>>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>>> join connect to), the 'servers' (dedicated computers
> >>>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>> can
> >>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>> pretty much relied on not to go down, although
> >>>>>>>>>>>>>>>>>>>> redundancy
> >>>>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>> always
> >>>>>>>>>>>>>>>>>>>> nice), and the 'clients' (computers that send in
> >>>>>>>>>>>>>>>>>>>> requests
> >>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>> can
> >>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>> used for spare CPU cycles.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Ok, so much for assumptions... :-)
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Things *I* think any design should emphasize:
> >>>>>>>>>>>>>>>>>>>>  * security.
> >>>>>>>>>>>>>>>>>>>>  * relative ease of use, while retaining significant
> >>>>>>>> power.
> >>>>>>>>>>>>>>>>>>>> Challenging.  In particular, it should be possible to
> >>>>>>>>>>>>>>>>>>>> set
> >>>>>>>>>>>>>>>>>>>> up
> >>>>>>>>>> a
> >>>>>>>>>>>>>>> modred
> >>>>>>>>>>>>>>>>>>>> network in under an hour, provided the computers are
> >>>>>>>>>>>>>>>>>>>> already
> >>>>>>>>>>>> set
> >>>>>>>>>>>>>>> up.
> >>>>>>>>>>>>>>>>>>>>  * along with the previous bullet point, having an
> >>>>>>>>>>>>>>>>>>>> interface
> >>>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>> lets
> >>>>>>>>>>>>>>>>>>>> one use the entire network like a single computer.
>  This
> >>>>>>>> is
> >>>>>>>>>>>> sort
> >>>>>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>>> like the way google docs works, except the cloud is
> >>>>>>>> private
> >>>>>>>>>>>>>>>>>>>>  * therefore, it should be a multi-user system with
> >>>>>>>>>>>>>>>>>>>> well-designed
> >>>>>>>>>>>>>>>>>>>> privileges etc...
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> I'm not going to discuss my implementation ideas,
> let's
> >>>>>>>>>>>>>>>>>>>> hear
> >>>>>>>>>>>>>>>>>>>> others
> >>>>>>>>>>>>>>>>>>>> first.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>> Scott Lawrence
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Webmaster
> >>>>>>>>>>>>>>>>>>>> The Blair Robot Project
> >>>>>>>>>>>>>>>>>>>> Montgomery Blair High School
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>> Scott Lawrence
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Webmaster
> >>>>>>>>>>>>>>>>>>>> The Blair Robot Project
> >>>>>>>>>>>>>>>>>>>> Montgomery Blair High School
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> _______________________________________________
> >>>>>>>>>>>>>>>>>>>  Mailing list: https://launchpad.net/~modred
> >>>>>>>>>>>>>>>>>>>  Post to     : modred@xxxxxxxxxxxxxxxxxxx
> >>>>>>>>>>>>>>>>>>>  Unsubscribe : https://launchpad.net/~modred
> >>>>>>>>>>>>>>>>>>>  More help   : https://help.launchpad.net/ListHelp
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> _______________________________________________
> >>>>>>>>>>>>>>>>>> Mailing list: https://launchpad.net/~modred
> >>>>>>>>>>>>>>>>>> Post to     : modred@xxxxxxxxxxxxxxxxxxx
> >>>>>>>>>>>>>>>>>> Unsubscribe : https://launchpad.net/~modred
> >>>>>>>>>>>>>>>>>> More help   : https://help.launchpad.net/ListHelp
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>> Scott Lawrence
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Webmaster
> >>>>>>>>>>>>>>>>> The Blair Robot Project
> >>>>>>>>>>>>>>>>> Montgomery Blair High School
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> _______________________________________________
> >>>>>>>>>>>>>>>>> Mailing list: https://launchpad.net/~modred
> >>>>>>>>>>>>>>>>> Post to     : modred@xxxxxxxxxxxxxxxxxxx
> >>>>>>>>>>>>>>>>> Unsubscribe : https://launchpad.net/~modred
> >>>>>>>>>>>>>>>>> More help   : https://help.launchpad.net/ListHelp
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>> Scott Lawrence
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Webmaster
> >>>>>>>>>>>>>>> The Blair Robot Project
> >>>>>>>>>>>>>>> Montgomery Blair High School
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>> Scott Lawrence
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Webmaster
> >>>>>>>>>>>>>> The Blair Robot Project
> >>>>>>>>>>>>>> Montgomery Blair High School
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>> --
> >>>>>>>>>>>>> Scott Lawrence
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Webmaster
> >>>>>>>>>>>>> The Blair Robot Project
> >>>>>>>>>>>>> Montgomery Blair High School
> >>>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>>> Scott Lawrence
> >>>>>>>>>>>>
> >>>>>>>>>>>> Webmaster
> >>>>>>>>>>>> The Blair Robot Project
> >>>>>>>>>>>> Montgomery Blair High School
> >>>>>>>>>>>>
> >>>>>>>>>>>> _______________________________________________
> >>>>>>>>>>>> Mailing list: https://launchpad.net/~modred
> >>>>>>>>>>>> Post to     : modred@xxxxxxxxxxxxxxxxxxx
> >>>>>>>>>>>> Unsubscribe : https://launchpad.net/~modred
> >>>>>>>>>>>> More help   : https://help.launchpad.net/ListHelp
> >>>>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> Scott Lawrence
> >>>>>>>>>>
> >>>>>>>>>> Webmaster
> >>>>>>>>>> The Blair Robot Project
> >>>>>>>>>> Montgomery Blair High School
> >>>>>>>>>>
> >>>>>>>> --
> >>>>>>>> Scott Lawrence
> >>>>>>>>
> >>>>>>>> Webmaster
> >>>>>>>> The Blair Robot Project
> >>>>>>>> Montgomery Blair High School
> >>>>>>>>
> >>>>
> >>>
> >>> _______________________________________________
> >>> Mailing list: https://launchpad.net/~modred
> >>> Post to     : modred@xxxxxxxxxxxxxxxxxxx
> >>> Unsubscribe : https://launchpad.net/~modred
> >>> More help   : https://help.launchpad.net/ListHelp
> >>>
> >>
> >>
> >
> >
> > _______________________________________________
> > Mailing list: https://launchpad.net/~modred
> > Post to     : modred@xxxxxxxxxxxxxxxxxxx
> > Unsubscribe : https://launchpad.net/~modred
> > More help   : https://help.launchpad.net/ListHelp
> >
>
>
> --
> Scott Lawrence
>
> Webmaster
> The Blair Robot Project
> Montgomery Blair High School
>
> _______________________________________________
> Mailing list: https://launchpad.net/~modred
> Post to     : modred@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~modred
> More help   : https://help.launchpad.net/ListHelp
>

Follow ups

References