modred team mailing list archive

Thread
Date
Re: Concept stuff

To: modred <modred@xxxxxxxxxxxxxxxxxxx>
From: Scott Lawrence <bytbox@xxxxxxxxx>
Date: Mon, 28 Dec 2009 00:45:13 -0500
In-reply-to: <53a52e1f0912272136p690d1a33h3a6e7e4369e4a9ee@mail.gmail.com>
"What happens when a client disconnects with unfinished work? Is the
work immeditately reassigned, or does the server wait for a specified
period, etc. This could come up quite a lot because some clients will
just disconnect as soon as work they submitted is completed."

Ouch.  Good question.  Here's one solution: small tasks (expected time
<2 seconds) are always assigned to two or more clients/servers.  If
both disconnect, reassign, if one disconnects, use the other guy's
answer. Large tasks, if a computer stops regularly checking in every 5
or so seconds, give that computer's results to date to another
computer.  So yeah, I think a client should have to make regular
reports to a server.

Here's another problem: how do we tell how many CPU time credits to
grant a client?  We can't always tell how long a problem should take
beforehand.

Here's another problem: which computer should handle the clients?  As
I've been thinking about this, there are three types of computers, the
single hub, the various dedicated servers (capable of storing
permanent data), and the clients.  (The hub is necessary - without it,
the performance of the cluster drastically decreases.) So clients
connect to the hub, and then the hub directs all computers.  But the
hub will get overloaded if 100 computers are checking in every 10
seconds to give it more data (and then the hub has to pass this on to
other servers for storage, etc...).  So at some point, the hub needs
to tell the client to talk to the server.  When?

Who wants to create that prototype?

On 12/28/09, Scott Lawrence <bytbox@xxxxxxxxx> wrote:
> This is where we start building prototypes. However, just to keep the
> theoretical side going: I disagree about the privacy issue.  Most
> operations that would benefit from the CPU time of a cluster (notice
> I'm not talking about the data storage and reliability benefits, which
> aren't affected by the presence of clients) are not very private.
> Rendering nice screensavers ("Electric Sheep", I think that one's
> called), and hefty data sifting aren't private - who cares about the
> screensavers, and the data is generally public anyway (of course if it
> wasn't, it would be marked so).
>
> Ray tracing and simulation could be more of an issue.  Hypothetical
> situation: Alice is simulating how wind will affect her proprietary
> airplane design.  Naturally, she can't hand off the whole design, or
> even parts of the design, to random client computers.  This is where
> the windows programmer says, "so the client computers can't help
> Alice."  But that's not true - as a bad example, what if the Modred
> hub gave to a client computer 80 types of landing gear, and told the
> computer, not to simulate something, but to solve a general formula
> that could later be used in the computation in a trivial and quick
> way?  If that client is evil, it will learn that Alice's airplane has
> some sort of landing gear.
>
> Somebody needs to create a prototype of a server that can create
> arbitrary problems in some format, so we can all try to trick it.  I
> suggest lisp as the language, but it's up to the implementer.
>
>
> On 12/28/09, Scott Lawrence <bytbox@xxxxxxxxx> wrote:
>> ---------- Forwarded message ----------
>> From: Frederic Koehler <fkfire@xxxxxxxxx>
>> Date: Sun, 27 Dec 2009 23:21:28 -0500
>> Subject: Re: [Modred] Concept stuff
>> To: Scott Lawrence <bytbox@xxxxxxxxx>
>>
>>  * This is sort-of a solution (while obviously less-than-optimal
>> security,
>> some grid-computing stuff does this, like BOINC), however, it turns out
>> that
>> this may require custom validation methods - for example, it's normal for
>> floating point values to be different on different computers, and the
>> same
>> could apply for other computations.
>>  * A malicious client would only need to misbehave on certain problems
>> that
>> a malicious user could designate (or recognize obvious fake programs),
>> allowing the fake program test to work.
>>     - A better idea would be to randomly reduplicate some computations
>> many
>> times - the malicious client wouldn't notice anything, but could easily
>> be
>> singled out
>>
>>   * Thirdly is mostly the same thing I wrote before - only computations
>> that
>> are said to be totally unimportant privacy wise could benefit from
>> client-side computing.
>>
>> So I really think that client-side computation is only a good idea for a
>> small subset of problems (like the type that there already exist massive
>> grid computing solutions for, like SETI@HOME)
>>
>> On Sun, Dec 27, 2009 at 10:57 PM, Scott Lawrence <bytbox@xxxxxxxxx>
>> wrote:
>>
>>> I want clients to be used for computation, and I want maximum
>>> privacy+security given that restriction.  Some ideas:
>>>
>>> With a large network, two computers can perform the same computation.
>>> Furthermore, a smart modred hub can give fake problems to clients,
>>> just to make sure that they're operating correctly.  A client that
>>> isn't operating correctly gets cut. (No second chances! A program
>>> could exploit that!)
>>>
>>> If a user specifies a certain bit of data (SSN, for instance) as
>>> highly sensitive, modred should know not to hand off that computation
>>> to a client. (If it does by accident, it certainly should never hand
>>> off the data.) privacy++
>>>
>>> In all cases, computations should be anonymous. privacy++
>>>
>>> Other ideas?
>>>
>>>
>>> On 12/27/09, Frederic Koehler <fkfire@xxxxxxxxx> wrote:
>>> > The idea for client-side computation implies that we have
>>> > highly-trusted
>>> > clients... (we know they won't provide invalid answers) Otherwise,
>>> > client-side computation requires verifying answers and so is only
>>> > useful
>>> for
>>> > a few NP-ish problems. In addition (assuming trusted clients).
>>> >
>>> > Also, it means that, since computations can contain sensitive data,
>>> > the
>>> > abillity to spread the computation is limited - unless we know the
>>> > computation is not user-sensitive, it can only try to use the user's
>>> > client(s). This way we also know that the client has no interest in
>>> > sabatoging answers to mess with other users (except to exploit
>>> server-side
>>> > weaknesses, which is inevitable).
>>> >
>>> > On Sat, Dec 26, 2009 at 10:28 PM, Scott Lawrence <bytbox@xxxxxxxxx>
>>> wrote:
>>> >
>>> >> Here is what, as I envision it, will make modred unique (and hard):
>>> >>
>>> >>  * Support for clients who can come and leave, lending CPU time and
>>> >> using CPU time as they choose.  There are some clusters that support
>>> >> this, but not very many.
>>> >>  * Support for computers participating across the internet.  This
>>> >> goes
>>> >> along with the previous part, but remember we need security to make
>>> >> this worth anything. This also means that user data could potentially
>>> >> be passed to untrusted computers - we need a way to prevent this.
>>> >>  * The ability for clients to run on any OS, using perl, python,
>>> >> java,
>>> >> or (on unix systems) C and C++ (servers and the hub will need to run
>>> >> on linux or at least another unix, or a dedicated OS which we may
>>> >> decide to write)
>>> >>  * Modred has great ease of use because it acts as a single unified
>>> >> computer - a special client program exists that allows one to log in,
>>> >> access and edit files, etc...  This is very close to unique - google
>>> >> has it, though
>>> >>
>>> >> Because of that last point, many OS design issues should come up when
>>> >> we code modred. (I think Freddy pointed this out?) Thus, we have a
>>> >> chance to fix flaws in standard unix, incorporating plan 9-type stuff
>>> >> (google it and read about it - Plan 9 from Bell Labs, the way the
>>> >> future of unix was) while also creating an actually usable user
>>> >> interface. (No offense, but to a newbie non-super-technical user,
>>> >> linux is a bit harsh...)
>>> >>
>>> >> Some implementation questions and ideas:
>>> >>
>>> >>  - how will updates be handled?  Remember we've got 200 computers
>>> >> potentially, some of which might be clients that want to participate
>>> >> in multiple clusters.
>>> >>
>>> >>  - maybe we should have programs not include front ends.  Instead,
>>> >> the
>>> >> modred software creates a front-end from the program's self
>>> >> description.  This would enforce a consistent user interface if we
>>> >> could implement it well
>>> >>
>>> >>  - how can we keep users from being able to snoop on each others'
>>> >> data?
>>> >>
>>> >> That's just a sample to get people thinking.
>>> >>
>>> >>
>>> >> On 12/26/09, David Tolnay <dtolnay@xxxxxxxxx> wrote:
>>> >> > Before diving in to specifics about the implementation I think we
>>> >> > need
>>> >> > to decide how we want modred to be different from (read: better
>>> >> > than)
>>> >> > existing bootable cluster environments. Here is a short list to
>>> >> > check
>>> >> > out:
>>> >> >
>>> >> > Bootable Cluster CD (http://bccd.net/) - folks presented this at
>>> >> > SC09
>>> >> > in portland, it was pretty neat stuff. Packed with education /
>>> >> > debugging / visualization features
>>> >> >
>>> >> > Oscar (http://svn.oscar.openclustergroup.org/trac/oscar) - very
>>> >> > trivially simple way to transform an existing unix lab into a
>>> >> > cluster
>>> >> > resource
>>> >> >
>>> >> > Lnx-bbc (http://www.lnx-bbc.com/) - includes cowsay!
>>> >> >
>>> >> > Perceus/warewulf (http://www.perceus.org/portal/) - a lot of other
>>> >> > sites made reference to this, haven't read too much about it
>>> >> >
>>> >> > What specifically do you want to improve over any of these?
>>> >> >
>>> >> >
>>> >> > On 12/25/09, Frederic Koehler <fkfire@xxxxxxxxx> wrote:
>>> >> >>
>>> >> >>
>>> >> >> So, as far as I understand this project, the idea is to build
>>> >> >> both a client library and a program using the library to do
>>> clustering
>>> >> >> stuff, along with matching server/hub foo (the library might be
>>> >> >> the
>>> >> >> same
>>> >> >> or
>>> >> >> whatever, not important).
>>> >> >>
>>> >> >> So from this understanding, it seems that the system should
>>> >> >> provide
>>> >> >> some
>>> >> >> basic pseudo-operating system stuff and programs can build on
>>> >> >> that,
>>> >> >> just
>>> >> >> like they would normally build on their local libc/kernel and
>>> >> >> stuff.
>>> >> >>
>>> >> >> So (I sure like the word "so" today...) if we want the type of
>>> general
>>> >> >> os-like stuff it seems their needs to be support for:
>>> >> >>    * A simpe message passing model - abstract away all the
>>> >> >> TCP-foo,
>>> >>  maybe
>>> >> >> use existing foo here (obviously needs fleshing out)
>>> >> >>    * Permanent storage IO (clone the unix write(), read(), open()
>>> >> >> and
>>> >> >> sync()
>>> >> >> model,  or maybe just use one of the existing database-ish nosql
>>> things
>>> >> >> out
>>> >> >> there)
>>> >> >>            - Unix-ish model - you create your data hunk, say you
>>> >> >> want
>>> >> all
>>> >> >> this stuff in it, then after sync() we know it's actually
>>> >> >> somewhere
>>> >> >> written
>>> >> >> on a hard-drive, and other things can read it too
>>> >> >>            - Unless this isn't in fact needed (but I assume it is)
>>> >> >>            - Also need to figure out if it's filesystem-ish foo
>>> >> >> (hierarchial) we want or more relational database-ish stuff
>>> >> >>
>>> >> >>    * A task delegation model - some type of map/reduce-ish stuff
>>> >> >>           - Servers have a few built-in computations, and client
>>> >> utilizes
>>> >> >> them?
>>> >> >>           - Or more complex, servers run sandboxed computational
>>> code?
>>> >> >>    * A security system?
>>> >> >>         - Needs fleshing out
>>> >> >>         - Presumably what the "hub" manages - it's the trusted
>>> >> >> thing
>>> >> >>         - Obviously, not everybody is allowed to use the cluster
>>> >> >> for
>>> >> >> computation, not everybody can find out what everybody else is
>>> >> >> doing,
>>> >> etc.
>>> >> >>       - But also, is their a limit on storage, are some things
>>> >> prioritized
>>> >> >> over others, ?
>>> >> >>
>>> >> >> Theroretically, server's are written to provide the io backend and
>>> >> >> to
>>> >> >> allow
>>> >> >> for task delegation, clients use the api, although hub has it's
>>> >> >> work
>>> >> >> cut
>>> >> >> out
>>> >> >> delegating all the file io and figuring out what the state of that
>>> is.
>>> >> >>
>>> >> >> On top of some mixture of this, one could build a simple unix-ish
>>> >> >> pseudo-cli, theroretically, as well as real software.
>>> >> >>
>>> >> >> Anyway, before actually doing anything, people should read about
>>> >> >> PVM
>>> >> >> (Parallel Virtual Machine) and the like (maybe also Hadoop and
>>> >> >> other
>>> >> >> foo-ish
>>> >> >> stuff) so Modred isn't just a bad clone of it
>>> >> >>
>>> >> >> Anyway, (yes, twice in a row!), I figured _someone_ had to respond
>>> >> >> to
>>> >> >> Scott,
>>> >> >> otherwise he'd feel all lonely and sad :P Now he can have a warm
>>> fuzzy
>>> >> >> feeling of deep confusion and uncertainty instead :P
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> On Fri, Dec 25, 2009 at 11:06 PM, Scott Lawrence
>>> >> >> <bytbox@xxxxxxxxx>
>>> >> wrote:
>>> >> >> > ---------- Forwarded message ----------
>>> >> >> > From: Scott Lawrence <bytbox@xxxxxxxxx>
>>> >> >> > Date: Fri, 25 Dec 2009 19:20:13 -0500
>>> >> >> > Subject: Design Overview
>>> >> >> > To: modred <modred@xxxxxxxxxxxxxxxxxxx>
>>> >> >> >
>>> >> >> > I'm going to assume that everyone understands the basic concepts
>>> for
>>> >> >> > modred: a set of networked computers (by 'networked' I mean,
>>> they're
>>> >> >> > all on the internet), divided for the sake of discussion into
>>> >> >> > three
>>> >> >> > classes: the 'hub' (the dude in charge, who compupters who want
>>> >> >> > to
>>> >> >> > join connect to), the 'servers' (dedicated computers that can be
>>> >> >> > pretty much relied on not to go down, although redundancy is
>>> >> >> > always
>>> >> >> > nice), and the 'clients' (computers that send in requests and
>>> >> >> > can
>>> be
>>> >> >> > used for spare CPU cycles.
>>> >> >> >
>>> >> >> > Ok, so much for assumptions... :-)
>>> >> >> >
>>> >> >> > Things *I* think any design should emphasize:
>>> >> >> >  * security.
>>> >> >> >  * relative ease of use, while retaining significant power.
>>> >> >> > Challenging.  In particular, it should be possible to set up a
>>> modred
>>> >> >> > network in under an hour, provided the computers are already set
>>> up.
>>> >> >> >  * along with the previous bullet point, having an interface
>>> >> >> > that
>>> >> >> > lets
>>> >> >> > one use the entire network like a single computer.  This is sort
>>> >> >> > of
>>> >> >> > like the way google docs works, except the cloud is private
>>> >> >> >  * therefore, it should be a multi-user system with
>>> >> >> > well-designed
>>> >> >> > privileges etc...
>>> >> >> >
>>> >> >> > I'm not going to discuss my implementation ideas, let's hear
>>> >> >> > others
>>> >> >> > first.
>>> >> >> >
>>> >> >> > --
>>> >> >> > Scott Lawrence
>>> >> >> >
>>> >> >> > Webmaster
>>> >> >> > The Blair Robot Project
>>> >> >> > Montgomery Blair High School
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > --
>>> >> >> > Scott Lawrence
>>> >> >> >
>>> >> >> > Webmaster
>>> >> >> > The Blair Robot Project
>>> >> >> > Montgomery Blair High School
>>> >> >> >
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> _______________________________________________
>>> >> >>  Mailing list: https://launchpad.net/~modred
>>> >> >>  Post to     : modred@xxxxxxxxxxxxxxxxxxx
>>> >> >>  Unsubscribe : https://launchpad.net/~modred
>>> >> >>  More help   : https://help.launchpad.net/ListHelp
>>> >> >>
>>> >> >>
>>> >> >
>>> >> > _______________________________________________
>>> >> > Mailing list: https://launchpad.net/~modred
>>> >> > Post to     : modred@xxxxxxxxxxxxxxxxxxx
>>> >> > Unsubscribe : https://launchpad.net/~modred
>>> >> > More help   : https://help.launchpad.net/ListHelp
>>> >> >
>>> >>
>>> >>
>>> >> --
>>> >> Scott Lawrence
>>> >>
>>> >> Webmaster
>>> >> The Blair Robot Project
>>> >> Montgomery Blair High School
>>> >>
>>> >> _______________________________________________
>>> >> Mailing list: https://launchpad.net/~modred
>>> >> Post to     : modred@xxxxxxxxxxxxxxxxxxx
>>> >> Unsubscribe : https://launchpad.net/~modred
>>> >> More help   : https://help.launchpad.net/ListHelp
>>> >>
>>> >
>>>
>>>
>>> --
>>> Scott Lawrence
>>>
>>> Webmaster
>>> The Blair Robot Project
>>> Montgomery Blair High School
>>>
>>
>>
>>
>> --
>> Scott Lawrence
>>
>> Webmaster
>> The Blair Robot Project
>> Montgomery Blair High School
>>
>
>
> --
> Scott Lawrence
>
> Webmaster
> The Blair Robot Project
> Montgomery Blair High School
>


-- 
Scott Lawrence

Webmaster
The Blair Robot Project
Montgomery Blair High School
Follow ups

Re: Concept stuff
From: Frederic Koehler, 2009-12-28
References

Concept stuff
From: Frederic Koehler, 2009-12-26
Re: Concept stuff
From: David Tolnay, 2009-12-27
Re: Concept stuff
From: Scott Lawrence, 2009-12-27
Re: Concept stuff
From: Frederic Koehler, 2009-12-28
Re: Concept stuff
From: Scott Lawrence, 2009-12-28
Fwd: Concept stuff
From: Scott Lawrence, 2009-12-28
Re: Concept stuff
From: Scott Lawrence, 2009-12-28