modred team mailing list archive

Thread
Date
Re: Fwd: Concept stuff

To: Michael Cohen <gnurdux@xxxxxxxxx>
From: Scott Lawrence <bytbox@xxxxxxxxx>
Date: Mon, 28 Dec 2009 22:54:52 -0500
Cc: modred@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4B397CDF.8060109@gmail.com>
I like this. And actually, the cluster could use the designated values
to estimate how long tasks will take.


Mikey: can you go to launchpad and make a blueprint for this?

On 12/28/09, Michael Cohen <gnurdux@xxxxxxxxx> wrote:
> The idea here is that if people are smart, stuff should basically
> acquire a "market value."  Credits would be basically like money.  The
> only issue is how credits originate, since someone needs to get them
> before people spend them.  But you could probably make some "official"
> projects that you will automatically get credits at a certain rate for
> participating in.
>
> Michael Cohen
>
> Scott Lawrence wrote:
>> Oh, that's interesting.  It might actually work... Other comments?
>>
>>
>> On 12/28/09, Michael Cohen <gnurdux@xxxxxxxxx> wrote:
>>> Wrong.  You need to have credits to pay people with.  And so you have to
>>> spend more credits if you want to pay people more.
>>>
>>> Michael Cohen
>>>
>>> Scott Lawrence wrote:
>>>> No.  That leads to people trying to make their projects as valuable as
>>>> possible, and within 3 days, the whole system will be worthless.
>>>>
>>>> Please remember that the mailing lists at launchpad don't perform
>>>> reply-to mangling. Instruct your client accordingly.
>>>>
>>>> On 12/28/09, Michael Cohen <gnurdux@xxxxxxxxx> wrote:
>>>>> I would actually award credits on a per-project basis.  Each project
>>>>> simply chooses how many credits to award for each task.  Your computer
>>>>> is offered so many credits for so much work; if it finishes it gets the
>>>>> creds and otherwise not.  Trying to do it based on time is dumb because
>>>>> we don't care about time, we care about computational power
>>>>> contributed.
>>>>>   If someone gives me 5000 hours, but in a VM limited to run at .1% of
>>>>> their Pentium 3 CPU, then that isn't worth as much to me as 5 hours on
>>>>> someone's brand new quad core powerhouse.
>>>>>
>>>>> Michael Cohen
>>>>>
>>>>> Scott Lawrence wrote:
>>>>>> Some administrators will consider CPU time credits to be very
>>>>>> important, though. Especially if they want them to be
>>>>>> buyable/exchangable for that stink green stuff.
>>>>>>
>>>>>> Let's deal with this later - we could just validate the same way we do
>>>>>> right answer/wrong answer validation.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 12/28/09, Frederic Koehler <fkfire@xxxxxxxxx> wrote:
>>>>>>> ---------- Forwarded message ----------
>>>>>>> From: Frederic Koehler <fkfire@xxxxxxxxx>
>>>>>>> Date: Mon, Dec 28, 2009 at 10:17 PM
>>>>>>> Subject: Re: [Modred] Concept stuff
>>>>>>> To: Scott Lawrence <bytbox@xxxxxxxxx>
>>>>>>>
>>>>>>>
>>>>>>> Network capacity can be protected by batching responses. Normal
>>>>>>> clients
>>>>>>> can
>>>>>>> save several computations and send one big response (like when
>>>>>>> exiting)
>>>>>>> -
>>>>>>> this will have similar bandwidth to big computations but avoid the
>>>>>>> potential
>>>>>>> for a really long computation wasting time on a bunch of computers
>>>>>>> without
>>>>>>> being solved.
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Dec 28, 2009 at 10:14 PM, Scott Lawrence <bytbox@xxxxxxxxx>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> * Maybe. But that still makes it too easy to gain false credits.
>>>>>>>> * Yeah, I'm agreeing with this.  The hub delegates to the servers,
>>>>>>>> and
>>>>>>>> the servers delegate to clients.
>>>>>>>> * This will overload network capacity, and I don't like it.  We
>>>>>>>> should
>>>>>>>> be able to trust clients to make long computations (long=over 30
>>>>>>>> seconds).  Maybe clients could give servers hints on how long
>>>>>>>> they'll
>>>>>>>> be on?
>>>>>>>>
>>>>>>>> On 12/28/09, Frederic Koehler <fkfire@xxxxxxxxx> wrote:
>>>>>>>>> Hah, my email died, wow...Wonder what happened to it....
>>>>>>>>> Anyway, this is something like what I wrote before:
>>>>>>>>>
>>>>>>>>> * CPU time credits can be very roughly estimated by averaging
>>>>>>>>> response
>>>>>>>> time.
>>>>>>>>> It's not all _that_ important anyway if nothing is a behemoth task.
>>>>>>>>>
>>>>>>>>> * The hub can immediately, upon establishing connection, redirect
>>>>>>>>> client
>>>>>>>> to
>>>>>>>>> a server. The server will still have to communicate with hub
>>>>>>>>> somewhat,
>>>>>>>> but
>>>>>>>>> it can only send stuff necessarily pertaining to the hub.
>>>>>>>>>
>>>>>>>>> * Jobs should probably be many small tasks to avoid the risk of
>>>>>>>>> losing
>>>>>>>>> a
>>>>>>>>> giant computation (since saving computation state is not
>>>>>>>>> easy/generalizable). Beyond that, sending keep-alive packets is
>>>>>>>>> enough
>>>>>>>>> to
>>>>>>>>> know when a client dies.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Dec 28, 2009 at 6:16 PM, Scott Lawrence <bytbox@xxxxxxxxx>
>>>>>>>> wrote:
>>>>>>>>>> What?
>>>>>>>>>>
>>>>>>>>>> On 12/28/09, Frederic Koehler <fkfire@xxxxxxxxx> wrote:
>>>>>>>>>>> On Mon, Dec 28, 2009 at 12:45 AM, Scott Lawrence
>>>>>>>>>>> <bytbox@xxxxxxxxx>
>>>>>>>>>> wrote:
>>>>>>>>>>>> "What happens when a client disconnects with unfinished work? Is
>>>>>>>>>>>> the
>>>>>>>>>>>> work immeditately reassigned, or does the server wait for a
>>>>>>>>>>>> specified
>>>>>>>>>>>> period, etc. This could come up quite a lot because some clients
>>>>>>>>>>>> will
>>>>>>>>>>>> just disconnect as soon as work they submitted is completed."
>>>>>>>>>>>>
>>>>>>>>>>>> Ouch.  Good question.  Here's one solution: small tasks
>>>>>>>>>>>> (expected
>>>>>>>> time
>>>>>>>>>>>> <2 seconds) are always assigned to two or more clients/servers.
>>>>>>>>>>>> If
>>>>>>>>>>>> both disconnect, reassign, if one disconnects, use the other
>>>>>>>>>>>> guy's
>>>>>>>>>>>> answer. Large tasks, if a computer stops regularly checking in
>>>>>>>>>>>> every
>>>>>>>> 5
>>>>>>>>>>>> or so seconds, give that computer's results to date to another
>>>>>>>>>>>> computer.  So yeah, I think a client should have to make regular
>>>>>>>>>>>> reports to a server.
>>>>>>>>>>>>
>>>>>>>>>>>> Here's another problem: how do we tell how many CPU time credits
>>>>>>>>>>>> to
>>>>>>>>>>>> grant a client?  We can't always tell how long a problem should
>>>>>>>>>>>> take
>>>>>>>>>>>> beforehand.
>>>>>>>>>>>>
>>>>>>>>>>>> Here's another problem: which computer should handle the
>>>>>>>>>>>> clients?
>>>>>>>>>>>> As
>>>>>>>>>>>> I've been thinking about this, there are three types of
>>>>>>>>>>>> computers,
>>>>>>>> the
>>>>>>>>>>>> single hub, the various dedicated servers (capable of storing
>>>>>>>>>>>> permanent data), and the clients.  (The hub is necessary -
>>>>>>>>>>>> without
>>>>>>>> it,
>>>>>>>>>>>> the performance of the cluster drastically decreases.) So
>>>>>>>>>>>> clients
>>>>>>>>>>>> connect to the hub, and then the hub directs all computers.  But
>>>>>>>>>>>> the
>>>>>>>>>>>> hub will get overloaded if 100 computers are checking in every
>>>>>>>>>>>> 10
>>>>>>>>>>>> seconds to give it more data (and then the hub has to pass this
>>>>>>>>>>>> on
>>>>>>>>>>>> to
>>>>>>>>>>>> other servers for storage, etc...).  So at some point, the hub
>>>>>>>>>>>> needs
>>>>>>>>>>>> to tell the client to talk to the server.  When?
>>>>>>>>>>>>
>>>>>>>>>>>> Who wants to create that prototype?
>>>>>>>>>>>>
>>>>>>>>>>>> On 12/28/09, Scott Lawrence <bytbox@xxxxxxxxx> wrote:
>>>>>>>>>>>>> This is where we start building prototypes. However, just to
>>>>>>>>>>>>> keep
>>>>>>>> the
>>>>>>>>>>>>> theoretical side going: I disagree about the privacy issue.
>>>>>>>>>>>>> Most
>>>>>>>>>>>>> operations that would benefit from the CPU time of a cluster
>>>>>>>> (notice
>>>>>>>>>>>>> I'm not talking about the data storage and reliability
>>>>>>>>>>>>> benefits,
>>>>>>>>>>>>> which
>>>>>>>>>>>>> aren't affected by the presence of clients) are not very
>>>>>>>>>>>>> private.
>>>>>>>>>>>>> Rendering nice screensavers ("Electric Sheep", I think that
>>>>>>>>>>>>> one's
>>>>>>>>>>>>> called), and hefty data sifting aren't private - who cares
>>>>>>>>>>>>> about
>>>>>>>> the
>>>>>>>>>>>>> screensavers, and the data is generally public anyway (of
>>>>>>>>>>>>> course
>>>>>>>>>>>>> if
>>>>>>>>>>>>> it
>>>>>>>>>>>>> wasn't, it would be marked so).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ray tracing and simulation could be more of an issue.
>>>>>>>>>>>>> Hypothetical
>>>>>>>>>>>>> situation: Alice is simulating how wind will affect her
>>>>>>>>>>>>> proprietary
>>>>>>>>>>>>> airplane design.  Naturally, she can't hand off the whole
>>>>>>>>>>>>> design,
>>>>>>>> or
>>>>>>>>>>>>> even parts of the design, to random client computers.  This is
>>>>>>>> where
>>>>>>>>>>>>> the windows programmer says, "so the client computers can't
>>>>>>>>>>>>> help
>>>>>>>>>>>>> Alice."  But that's not true - as a bad example, what if the
>>>>>>>>>>>>> Modred
>>>>>>>>>>>>> hub gave to a client computer 80 types of landing gear, and
>>>>>>>>>>>>> told
>>>>>>>> the
>>>>>>>>>>>>> computer, not to simulate something, but to solve a general
>>>>>>>>>>>>> formula
>>>>>>>>>>>>> that could later be used in the computation in a trivial and
>>>>>>>>>>>>> quick
>>>>>>>>>>>>> way?  If that client is evil, it will learn that Alice's
>>>>>>>>>>>>> airplane
>>>>>>>> has
>>>>>>>>>>>>> some sort of landing gear.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Somebody needs to create a prototype of a server that can
>>>>>>>>>>>>> create
>>>>>>>>>>>>> arbitrary problems in some format, so we can all try to trick
>>>>>>>>>>>>> it.
>>>>>>>>  I
>>>>>>>>>>>>> suggest lisp as the language, but it's up to the implementer.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 12/28/09, Scott Lawrence <bytbox@xxxxxxxxx> wrote:
>>>>>>>>>>>>>> ---------- Forwarded message ----------
>>>>>>>>>>>>>> From: Frederic Koehler <fkfire@xxxxxxxxx>
>>>>>>>>>>>>>> Date: Sun, 27 Dec 2009 23:21:28 -0500
>>>>>>>>>>>>>> Subject: Re: [Modred] Concept stuff
>>>>>>>>>>>>>> To: Scott Lawrence <bytbox@xxxxxxxxx>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  * This is sort-of a solution (while obviously
>>>>>>>>>>>>>> less-than-optimal
>>>>>>>>>>>>>> security,
>>>>>>>>>>>>>> some grid-computing stuff does this, like BOINC), however, it
>>>>>>>> turns
>>>>>>>>>> out
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>> this may require custom validation methods - for example, it's
>>>>>>>>>>>>>> normal
>>>>>>>>>>>> for
>>>>>>>>>>>>>> floating point values to be different on different computers,
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> same
>>>>>>>>>>>>>> could apply for other computations.
>>>>>>>>>>>>>>  * A malicious client would only need to misbehave on certain
>>>>>>>>>> problems
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>> a malicious user could designate (or recognize obvious fake
>>>>>>>>>> programs),
>>>>>>>>>>>>>> allowing the fake program test to work.
>>>>>>>>>>>>>>     - A better idea would be to randomly reduplicate some
>>>>>>>>>> computations
>>>>>>>>>>>>>> many
>>>>>>>>>>>>>> times - the malicious client wouldn't notice anything, but
>>>>>>>>>>>>>> could
>>>>>>>>>> easily
>>>>>>>>>>>>>> be
>>>>>>>>>>>>>> singled out
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>   * Thirdly is mostly the same thing I wrote before - only
>>>>>>>>>> computations
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>> are said to be totally unimportant privacy wise could benefit
>>>>>>>>>>>>>> from
>>>>>>>>>>>>>> client-side computing.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So I really think that client-side computation is only a good
>>>>>>>>>>>>>> idea
>>>>>>>>>> for
>>>>>>>>>>>>>> a
>>>>>>>>>>>>>> small subset of problems (like the type that there already
>>>>>>>>>>>>>> exist
>>>>>>>>>>>>>> massive
>>>>>>>>>>>>>> grid computing solutions for, like SETI@HOME)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sun, Dec 27, 2009 at 10:57 PM, Scott Lawrence <
>>>>>>>> bytbox@xxxxxxxxx>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I want clients to be used for computation, and I want maximum
>>>>>>>>>>>>>>> privacy+security given that restriction.  Some ideas:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> With a large network, two computers can perform the same
>>>>>>>>>> computation.
>>>>>>>>>>>>>>> Furthermore, a smart modred hub can give fake problems to
>>>>>>>> clients,
>>>>>>>>>>>>>>> just to make sure that they're operating correctly.  A client
>>>>>>>> that
>>>>>>>>>>>>>>> isn't operating correctly gets cut. (No second chances! A
>>>>>>>>>>>>>>> program
>>>>>>>>>>>>>>> could exploit that!)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If a user specifies a certain bit of data (SSN, for instance)
>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>> highly sensitive, modred should know not to hand off that
>>>>>>>>>> computation
>>>>>>>>>>>>>>> to a client. (If it does by accident, it certainly should
>>>>>>>>>>>>>>> never
>>>>>>>>>>>>>>> hand
>>>>>>>>>>>>>>> off the data.) privacy++
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In all cases, computations should be anonymous. privacy++
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Other ideas?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 12/27/09, Frederic Koehler <fkfire@xxxxxxxxx> wrote:
>>>>>>>>>>>>>>>> The idea for client-side computation implies that we have
>>>>>>>>>>>>>>>> highly-trusted
>>>>>>>>>>>>>>>> clients... (we know they won't provide invalid answers)
>>>>>>>>>>>>>>>> Otherwise,
>>>>>>>>>>>>>>>> client-side computation requires verifying answers and so is
>>>>>>>> only
>>>>>>>>>>>>>>>> useful
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>> a few NP-ish problems. In addition (assuming trusted
>>>>>>>>>>>>>>>> clients).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Also, it means that, since computations can contain
>>>>>>>>>>>>>>>> sensitive
>>>>>>>>>> data,
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> abillity to spread the computation is limited - unless we
>>>>>>>>>>>>>>>> know
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> computation is not user-sensitive, it can only try to use
>>>>>>>>>>>>>>>> the
>>>>>>>>>> user's
>>>>>>>>>>>>>>>> client(s). This way we also know that the client has no
>>>>>>>> interest
>>>>>>>>>> in
>>>>>>>>>>>>>>>> sabatoging answers to mess with other users (except to
>>>>>>>>>>>>>>>> exploit
>>>>>>>>>>>>>>> server-side
>>>>>>>>>>>>>>>> weaknesses, which is inevitable).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Sat, Dec 26, 2009 at 10:28 PM, Scott Lawrence <
>>>>>>>>>> bytbox@xxxxxxxxx>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> Here is what, as I envision it, will make modred unique
>>>>>>>>>>>>>>>>> (and
>>>>>>>>>> hard):
>>>>>>>>>>>>>>>>>  * Support for clients who can come and leave, lending CPU
>>>>>>>> time
>>>>>>>>>> and
>>>>>>>>>>>>>>>>> using CPU time as they choose.  There are some clusters
>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>> support
>>>>>>>>>>>>>>>>> this, but not very many.
>>>>>>>>>>>>>>>>>  * Support for computers participating across the internet.
>>>>>>>>>>>>>>>>> This
>>>>>>>>>>>>>>>>> goes
>>>>>>>>>>>>>>>>> along with the previous part, but remember we need security
>>>>>>>>>>>>>>>>> to
>>>>>>>>>> make
>>>>>>>>>>>>>>>>> this worth anything. This also means that user data could
>>>>>>>>>>>> potentially
>>>>>>>>>>>>>>>>> be passed to untrusted computers - we need a way to prevent
>>>>>>>>>>>>>>>>> this.
>>>>>>>>>>>>>>>>>  * The ability for clients to run on any OS, using perl,
>>>>>>>> python,
>>>>>>>>>>>>>>>>> java,
>>>>>>>>>>>>>>>>> or (on unix systems) C and C++ (servers and the hub will
>>>>>>>>>>>>>>>>> need
>>>>>>>> to
>>>>>>>>>>>>>>>>> run
>>>>>>>>>>>>>>>>> on linux or at least another unix, or a dedicated OS which
>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>> may
>>>>>>>>>>>>>>>>> decide to write)
>>>>>>>>>>>>>>>>>  * Modred has great ease of use because it acts as a single
>>>>>>>>>> unified
>>>>>>>>>>>>>>>>> computer - a special client program exists that allows one
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> log
>>>>>>>>>>>> in,
>>>>>>>>>>>>>>>>> access and edit files, etc...  This is very close to unique
>>>>>>>>>>>>>>>>> -
>>>>>>>>>>>>>>>>> google
>>>>>>>>>>>>>>>>> has it, though
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Because of that last point, many OS design issues should
>>>>>>>>>>>>>>>>> come
>>>>>>>> up
>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>> we code modred. (I think Freddy pointed this out?) Thus, we
>>>>>>>> have
>>>>>>>>>> a
>>>>>>>>>>>>>>>>> chance to fix flaws in standard unix, incorporating plan
>>>>>>>> 9-type
>>>>>>>>>>>> stuff
>>>>>>>>>>>>>>>>> (google it and read about it - Plan 9 from Bell Labs, the
>>>>>>>>>>>>>>>>> way
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> future of unix was) while also creating an actually usable
>>>>>>>> user
>>>>>>>>>>>>>>>>> interface. (No offense, but to a newbie non-super-technical
>>>>>>>>>>>>>>>>> user,
>>>>>>>>>>>>>>>>> linux is a bit harsh...)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Some implementation questions and ideas:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  - how will updates be handled?  Remember we've got 200
>>>>>>>>>>>>>>>>> computers
>>>>>>>>>>>>>>>>> potentially, some of which might be clients that want to
>>>>>>>>>>>>>>>>> participate
>>>>>>>>>>>>>>>>> in multiple clusters.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  - maybe we should have programs not include front ends.
>>>>>>>>>>  Instead,
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> modred software creates a front-end from the program's self
>>>>>>>>>>>>>>>>> description.  This would enforce a consistent user
>>>>>>>>>>>>>>>>> interface
>>>>>>>> if
>>>>>>>>>> we
>>>>>>>>>>>>>>>>> could implement it well
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  - how can we keep users from being able to snoop on each
>>>>>>>>>>>>>>>>> others'
>>>>>>>>>>>>>>>>> data?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> That's just a sample to get people thinking.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 12/26/09, David Tolnay <dtolnay@xxxxxxxxx> wrote:
>>>>>>>>>>>>>>>>>> Before diving in to specifics about the implementation I
>>>>>>>> think
>>>>>>>>>> we
>>>>>>>>>>>>>>>>>> need
>>>>>>>>>>>>>>>>>> to decide how we want modred to be different from (read:
>>>>>>>>>>>>>>>>>> better
>>>>>>>>>>>>>>>>>> than)
>>>>>>>>>>>>>>>>>> existing bootable cluster environments. Here is a short
>>>>>>>>>>>>>>>>>> list
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> check
>>>>>>>>>>>>>>>>>> out:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Bootable Cluster CD (http://bccd.net/) - folks presented
>>>>>>>> this
>>>>>>>>>> at
>>>>>>>>>>>>>>>>>> SC09
>>>>>>>>>>>>>>>>>> in portland, it was pretty neat stuff. Packed with
>>>>>>>>>>>>>>>>>> education
>>>>>>>> /
>>>>>>>>>>>>>>>>>> debugging / visualization features
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Oscar (http://svn.oscar.openclustergroup.org/trac/oscar) -
>>>>>>>>>> very
>>>>>>>>>>>>>>>>>> trivially simple way to transform an existing unix lab
>>>>>>>>>>>>>>>>>> into
>>>>>>>> a
>>>>>>>>>>>>>>>>>> cluster
>>>>>>>>>>>>>>>>>> resource
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Lnx-bbc (http://www.lnx-bbc.com/) - includes cowsay!
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Perceus/warewulf (http://www.perceus.org/portal/) - a lot
>>>>>>>> of
>>>>>>>>>>>> other
>>>>>>>>>>>>>>>>>> sites made reference to this, haven't read too much about
>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> What specifically do you want to improve over any of
>>>>>>>>>>>>>>>>>> these?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On 12/25/09, Frederic Koehler <fkfire@xxxxxxxxx> wrote:
>>>>>>>>>>>>>>>>>>> So, as far as I understand this project, the idea is to
>>>>>>>> build
>>>>>>>>>>>>>>>>>>> both a client library and a program using the library to
>>>>>>>>>>>>>>>>>>> do
>>>>>>>>>>>>>>> clustering
>>>>>>>>>>>>>>>>>>> stuff, along with matching server/hub foo (the library
>>>>>>>> might
>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> same
>>>>>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>>>>> whatever, not important).
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> So from this understanding, it seems that the system
>>>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>> provide
>>>>>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>>>> basic pseudo-operating system stuff and programs can
>>>>>>>>>>>>>>>>>>> build
>>>>>>>> on
>>>>>>>>>>>>>>>>>>> that,
>>>>>>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>>>>>> like they would normally build on their local libc/kernel
>>>>>>>> and
>>>>>>>>>>>>>>>>>>> stuff.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> So (I sure like the word "so" today...) if we want the
>>>>>>>>>>>>>>>>>>> type
>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>> general
>>>>>>>>>>>>>>>>>>> os-like stuff it seems their needs to be support for:
>>>>>>>>>>>>>>>>>>>    * A simpe message passing model - abstract away all
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> TCP-foo,
>>>>>>>>>>>>>>>>>  maybe
>>>>>>>>>>>>>>>>>>> use existing foo here (obviously needs fleshing out)
>>>>>>>>>>>>>>>>>>>    * Permanent storage IO (clone the unix write(),
>>>>>>>>>>>>>>>>>>> read(),
>>>>>>>>>>>>>>>>>>> open()
>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> sync()
>>>>>>>>>>>>>>>>>>> model,  or maybe just use one of the existing
>>>>>>>>>>>>>>>>>>> database-ish
>>>>>>>>>> nosql
>>>>>>>>>>>>>>> things
>>>>>>>>>>>>>>>>>>> out
>>>>>>>>>>>>>>>>>>> there)
>>>>>>>>>>>>>>>>>>>            - Unix-ish model - you create your data hunk,
>>>>>>>> say
>>>>>>>>>> you
>>>>>>>>>>>>>>>>>>> want
>>>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>>>>>> this stuff in it, then after sync() we know it's actually
>>>>>>>>>>>>>>>>>>> somewhere
>>>>>>>>>>>>>>>>>>> written
>>>>>>>>>>>>>>>>>>> on a hard-drive, and other things can read it too
>>>>>>>>>>>>>>>>>>>            - Unless this isn't in fact needed (but I
>>>>>>>>>>>>>>>>>>> assume
>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>> is)
>>>>>>>>>>>>>>>>>>>            - Also need to figure out if it's
>>>>>>>>>>>>>>>>>>> filesystem-ish
>>>>>>>>>> foo
>>>>>>>>>>>>>>>>>>> (hierarchial) we want or more relational database-ish
>>>>>>>>>>>>>>>>>>> stuff
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>    * A task delegation model - some type of
>>>>>>>>>>>>>>>>>>> map/reduce-ish
>>>>>>>>>> stuff
>>>>>>>>>>>>>>>>>>>           - Servers have a few built-in computations, and
>>>>>>>>>> client
>>>>>>>>>>>>>>>>> utilizes
>>>>>>>>>>>>>>>>>>> them?
>>>>>>>>>>>>>>>>>>>           - Or more complex, servers run sandboxed
>>>>>>>>>> computational
>>>>>>>>>>>>>>> code?
>>>>>>>>>>>>>>>>>>>    * A security system?
>>>>>>>>>>>>>>>>>>>         - Needs fleshing out
>>>>>>>>>>>>>>>>>>>         - Presumably what the "hub" manages - it's the
>>>>>>>>>>>>>>>>>>> trusted
>>>>>>>>>>>>>>>>>>> thing
>>>>>>>>>>>>>>>>>>>         - Obviously, not everybody is allowed to use the
>>>>>>>>>> cluster
>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>> computation, not everybody can find out what everybody
>>>>>>>>>>>>>>>>>>> else
>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>> doing,
>>>>>>>>>>>>>>>>> etc.
>>>>>>>>>>>>>>>>>>>       - But also, is their a limit on storage, are some
>>>>>>>>>>>>>>>>>>> things
>>>>>>>>>>>>>>>>> prioritized
>>>>>>>>>>>>>>>>>>> over others, ?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Theroretically, server's are written to provide the io
>>>>>>>>>>>>>>>>>>> backend
>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> allow
>>>>>>>>>>>>>>>>>>> for task delegation, clients use the api, although hub
>>>>>>>>>>>>>>>>>>> has
>>>>>>>>>> it's
>>>>>>>>>>>>>>>>>>> work
>>>>>>>>>>>>>>>>>>> cut
>>>>>>>>>>>>>>>>>>> out
>>>>>>>>>>>>>>>>>>> delegating all the file io and figuring out what the
>>>>>>>>>>>>>>>>>>> state
>>>>>>>> of
>>>>>>>>>>>> that
>>>>>>>>>>>>>>> is.
>>>>>>>>>>>>>>>>>>> On top of some mixture of this, one could build a simple
>>>>>>>>>>>>>>>>>>> unix-ish
>>>>>>>>>>>>>>>>>>> pseudo-cli, theroretically, as well as real software.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Anyway, before actually doing anything, people should
>>>>>>>>>>>>>>>>>>> read
>>>>>>>>>> about
>>>>>>>>>>>>>>>>>>> PVM
>>>>>>>>>>>>>>>>>>> (Parallel Virtual Machine) and the like (maybe also
>>>>>>>>>>>>>>>>>>> Hadoop
>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> other
>>>>>>>>>>>>>>>>>>> foo-ish
>>>>>>>>>>>>>>>>>>> stuff) so Modred isn't just a bad clone of it
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Anyway, (yes, twice in a row!), I figured _someone_ had
>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>> respond
>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> Scott,
>>>>>>>>>>>>>>>>>>> otherwise he'd feel all lonely and sad :P Now he can have
>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>> warm
>>>>>>>>>>>>>>> fuzzy
>>>>>>>>>>>>>>>>>>> feeling of deep confusion and uncertainty instead :P
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Fri, Dec 25, 2009 at 11:06 PM, Scott Lawrence
>>>>>>>>>>>>>>>>>>> <bytbox@xxxxxxxxx>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> ---------- Forwarded message ----------
>>>>>>>>>>>>>>>>>>>> From: Scott Lawrence <bytbox@xxxxxxxxx>
>>>>>>>>>>>>>>>>>>>> Date: Fri, 25 Dec 2009 19:20:13 -0500
>>>>>>>>>>>>>>>>>>>> Subject: Design Overview
>>>>>>>>>>>>>>>>>>>> To: modred <modred@xxxxxxxxxxxxxxxxxxx>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I'm going to assume that everyone understands the basic
>>>>>>>>>>>> concepts
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>> modred: a set of networked computers (by 'networked' I
>>>>>>>>>>>>>>>>>>>> mean,
>>>>>>>>>>>>>>> they're
>>>>>>>>>>>>>>>>>>>> all on the internet), divided for the sake of discussion
>>>>>>>>>> into
>>>>>>>>>>>>>>>>>>>> three
>>>>>>>>>>>>>>>>>>>> classes: the 'hub' (the dude in charge, who compupters
>>>>>>>> who
>>>>>>>>>>>>>>>>>>>> want
>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>> join connect to), the 'servers' (dedicated computers
>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>> can
>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>> pretty much relied on not to go down, although
>>>>>>>>>>>>>>>>>>>> redundancy
>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>> always
>>>>>>>>>>>>>>>>>>>> nice), and the 'clients' (computers that send in
>>>>>>>>>>>>>>>>>>>> requests
>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>> used for spare CPU cycles.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Ok, so much for assumptions... :-)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Things *I* think any design should emphasize:
>>>>>>>>>>>>>>>>>>>>  * security.
>>>>>>>>>>>>>>>>>>>>  * relative ease of use, while retaining significant
>>>>>>>> power.
>>>>>>>>>>>>>>>>>>>> Challenging.  In particular, it should be possible to
>>>>>>>>>>>>>>>>>>>> set
>>>>>>>>>>>>>>>>>>>> up
>>>>>>>>>> a
>>>>>>>>>>>>>>> modred
>>>>>>>>>>>>>>>>>>>> network in under an hour, provided the computers are
>>>>>>>>>>>>>>>>>>>> already
>>>>>>>>>>>> set
>>>>>>>>>>>>>>> up.
>>>>>>>>>>>>>>>>>>>>  * along with the previous bullet point, having an
>>>>>>>>>>>>>>>>>>>> interface
>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>> lets
>>>>>>>>>>>>>>>>>>>> one use the entire network like a single computer.  This
>>>>>>>> is
>>>>>>>>>>>> sort
>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>> like the way google docs works, except the cloud is
>>>>>>>> private
>>>>>>>>>>>>>>>>>>>>  * therefore, it should be a multi-user system with
>>>>>>>>>>>>>>>>>>>> well-designed
>>>>>>>>>>>>>>>>>>>> privileges etc...
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I'm not going to discuss my implementation ideas, let's
>>>>>>>>>>>>>>>>>>>> hear
>>>>>>>>>>>>>>>>>>>> others
>>>>>>>>>>>>>>>>>>>> first.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> Scott Lawrence
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Webmaster
>>>>>>>>>>>>>>>>>>>> The Blair Robot Project
>>>>>>>>>>>>>>>>>>>> Montgomery Blair High School
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> Scott Lawrence
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Webmaster
>>>>>>>>>>>>>>>>>>>> The Blair Robot Project
>>>>>>>>>>>>>>>>>>>> Montgomery Blair High School
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>  Mailing list: https://launchpad.net/~modred
>>>>>>>>>>>>>>>>>>>  Post to     : modred@xxxxxxxxxxxxxxxxxxx
>>>>>>>>>>>>>>>>>>>  Unsubscribe : https://launchpad.net/~modred
>>>>>>>>>>>>>>>>>>>  More help   : https://help.launchpad.net/ListHelp
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>> Mailing list: https://launchpad.net/~modred
>>>>>>>>>>>>>>>>>> Post to     : modred@xxxxxxxxxxxxxxxxxxx
>>>>>>>>>>>>>>>>>> Unsubscribe : https://launchpad.net/~modred
>>>>>>>>>>>>>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Scott Lawrence
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Webmaster
>>>>>>>>>>>>>>>>> The Blair Robot Project
>>>>>>>>>>>>>>>>> Montgomery Blair High School
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>> Mailing list: https://launchpad.net/~modred
>>>>>>>>>>>>>>>>> Post to     : modred@xxxxxxxxxxxxxxxxxxx
>>>>>>>>>>>>>>>>> Unsubscribe : https://launchpad.net/~modred
>>>>>>>>>>>>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Scott Lawrence
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Webmaster
>>>>>>>>>>>>>>> The Blair Robot Project
>>>>>>>>>>>>>>> Montgomery Blair High School
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Scott Lawrence
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Webmaster
>>>>>>>>>>>>>> The Blair Robot Project
>>>>>>>>>>>>>> Montgomery Blair High School
>>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Scott Lawrence
>>>>>>>>>>>>>
>>>>>>>>>>>>> Webmaster
>>>>>>>>>>>>> The Blair Robot Project
>>>>>>>>>>>>> Montgomery Blair High School
>>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Scott Lawrence
>>>>>>>>>>>>
>>>>>>>>>>>> Webmaster
>>>>>>>>>>>> The Blair Robot Project
>>>>>>>>>>>> Montgomery Blair High School
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Mailing list: https://launchpad.net/~modred
>>>>>>>>>>>> Post to     : modred@xxxxxxxxxxxxxxxxxxx
>>>>>>>>>>>> Unsubscribe : https://launchpad.net/~modred
>>>>>>>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Scott Lawrence
>>>>>>>>>>
>>>>>>>>>> Webmaster
>>>>>>>>>> The Blair Robot Project
>>>>>>>>>> Montgomery Blair High School
>>>>>>>>>>
>>>>>>>> --
>>>>>>>> Scott Lawrence
>>>>>>>>
>>>>>>>> Webmaster
>>>>>>>> The Blair Robot Project
>>>>>>>> Montgomery Blair High School
>>>>>>>>
>>>>
>>>
>>> _______________________________________________
>>> Mailing list: https://launchpad.net/~modred
>>> Post to     : modred@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~modred
>>> More help   : https://help.launchpad.net/ListHelp
>>>
>>
>>
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~modred
> Post to     : modred@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~modred
> More help   : https://help.launchpad.net/ListHelp
>


-- 
Scott Lawrence

Webmaster
The Blair Robot Project
Montgomery Blair High School
Follow ups

Re: Fwd: Concept stuff
From: Frederic Koehler, 2009-12-29
References

Concept stuff
From: Frederic Koehler, 2009-12-26
Re: Concept stuff
From: Scott Lawrence, 2009-12-29
Fwd: Concept stuff
From: Frederic Koehler, 2009-12-29
Re: Fwd: Concept stuff
From: Scott Lawrence, 2009-12-29
Re: Fwd: Concept stuff
From: Scott Lawrence, 2009-12-29
Re: Fwd: Concept stuff
From: Michael Cohen, 2009-12-29
Re: Fwd: Concept stuff
From: Scott Lawrence, 2009-12-29
Re: Fwd: Concept stuff
From: Michael Cohen, 2009-12-29