modred team mailing list archive

Thread
Date
Re: Concept stuff

To: Frederic Koehler <fkfire@xxxxxxxxx>
From: Scott Lawrence <bytbox@xxxxxxxxx>
Date: Mon, 28 Dec 2009 18:16:03 -0500
Cc: modred <modred@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <257ca8ac0912281037k49fdeaecm8312fea63443d1c4@mail.gmail.com>
What?

On 12/28/09, Frederic Koehler <fkfire@xxxxxxxxx> wrote:
> On Mon, Dec 28, 2009 at 12:45 AM, Scott Lawrence <bytbox@xxxxxxxxx> wrote:
>
>> "What happens when a client disconnects with unfinished work? Is the
>> work immeditately reassigned, or does the server wait for a specified
>> period, etc. This could come up quite a lot because some clients will
>> just disconnect as soon as work they submitted is completed."
>>
>> Ouch.  Good question.  Here's one solution: small tasks (expected time
>> <2 seconds) are always assigned to two or more clients/servers.  If
>> both disconnect, reassign, if one disconnects, use the other guy's
>> answer. Large tasks, if a computer stops regularly checking in every 5
>> or so seconds, give that computer's results to date to another
>> computer.  So yeah, I think a client should have to make regular
>> reports to a server.
>>
>> Here's another problem: how do we tell how many CPU time credits to
>> grant a client?  We can't always tell how long a problem should take
>> beforehand.
>>
>> Here's another problem: which computer should handle the clients?  As
>> I've been thinking about this, there are three types of computers, the
>> single hub, the various dedicated servers (capable of storing
>> permanent data), and the clients.  (The hub is necessary - without it,
>> the performance of the cluster drastically decreases.) So clients
>> connect to the hub, and then the hub directs all computers.  But the
>> hub will get overloaded if 100 computers are checking in every 10
>> seconds to give it more data (and then the hub has to pass this on to
>> other servers for storage, etc...).  So at some point, the hub needs
>> to tell the client to talk to the server.  When?
>>
>> Who wants to create that prototype?
>>
>> On 12/28/09, Scott Lawrence <bytbox@xxxxxxxxx> wrote:
>> > This is where we start building prototypes. However, just to keep the
>> > theoretical side going: I disagree about the privacy issue.  Most
>> > operations that would benefit from the CPU time of a cluster (notice
>> > I'm not talking about the data storage and reliability benefits, which
>> > aren't affected by the presence of clients) are not very private.
>> > Rendering nice screensavers ("Electric Sheep", I think that one's
>> > called), and hefty data sifting aren't private - who cares about the
>> > screensavers, and the data is generally public anyway (of course if it
>> > wasn't, it would be marked so).
>> >
>> > Ray tracing and simulation could be more of an issue.  Hypothetical
>> > situation: Alice is simulating how wind will affect her proprietary
>> > airplane design.  Naturally, she can't hand off the whole design, or
>> > even parts of the design, to random client computers.  This is where
>> > the windows programmer says, "so the client computers can't help
>> > Alice."  But that's not true - as a bad example, what if the Modred
>> > hub gave to a client computer 80 types of landing gear, and told the
>> > computer, not to simulate something, but to solve a general formula
>> > that could later be used in the computation in a trivial and quick
>> > way?  If that client is evil, it will learn that Alice's airplane has
>> > some sort of landing gear.
>> >
>> > Somebody needs to create a prototype of a server that can create
>> > arbitrary problems in some format, so we can all try to trick it.  I
>> > suggest lisp as the language, but it's up to the implementer.
>> >
>> >
>> > On 12/28/09, Scott Lawrence <bytbox@xxxxxxxxx> wrote:
>> >> ---------- Forwarded message ----------
>> >> From: Frederic Koehler <fkfire@xxxxxxxxx>
>> >> Date: Sun, 27 Dec 2009 23:21:28 -0500
>> >> Subject: Re: [Modred] Concept stuff
>> >> To: Scott Lawrence <bytbox@xxxxxxxxx>
>> >>
>> >>  * This is sort-of a solution (while obviously less-than-optimal
>> >> security,
>> >> some grid-computing stuff does this, like BOINC), however, it turns out
>> >> that
>> >> this may require custom validation methods - for example, it's normal
>> for
>> >> floating point values to be different on different computers, and the
>> >> same
>> >> could apply for other computations.
>> >>  * A malicious client would only need to misbehave on certain problems
>> >> that
>> >> a malicious user could designate (or recognize obvious fake programs),
>> >> allowing the fake program test to work.
>> >>     - A better idea would be to randomly reduplicate some computations
>> >> many
>> >> times - the malicious client wouldn't notice anything, but could easily
>> >> be
>> >> singled out
>> >>
>> >>   * Thirdly is mostly the same thing I wrote before - only computations
>> >> that
>> >> are said to be totally unimportant privacy wise could benefit from
>> >> client-side computing.
>> >>
>> >> So I really think that client-side computation is only a good idea for
>> >> a
>> >> small subset of problems (like the type that there already exist
>> >> massive
>> >> grid computing solutions for, like SETI@HOME)
>> >>
>> >> On Sun, Dec 27, 2009 at 10:57 PM, Scott Lawrence <bytbox@xxxxxxxxx>
>> >> wrote:
>> >>
>> >>> I want clients to be used for computation, and I want maximum
>> >>> privacy+security given that restriction.  Some ideas:
>> >>>
>> >>> With a large network, two computers can perform the same computation.
>> >>> Furthermore, a smart modred hub can give fake problems to clients,
>> >>> just to make sure that they're operating correctly.  A client that
>> >>> isn't operating correctly gets cut. (No second chances! A program
>> >>> could exploit that!)
>> >>>
>> >>> If a user specifies a certain bit of data (SSN, for instance) as
>> >>> highly sensitive, modred should know not to hand off that computation
>> >>> to a client. (If it does by accident, it certainly should never hand
>> >>> off the data.) privacy++
>> >>>
>> >>> In all cases, computations should be anonymous. privacy++
>> >>>
>> >>> Other ideas?
>> >>>
>> >>>
>> >>> On 12/27/09, Frederic Koehler <fkfire@xxxxxxxxx> wrote:
>> >>> > The idea for client-side computation implies that we have
>> >>> > highly-trusted
>> >>> > clients... (we know they won't provide invalid answers) Otherwise,
>> >>> > client-side computation requires verifying answers and so is only
>> >>> > useful
>> >>> for
>> >>> > a few NP-ish problems. In addition (assuming trusted clients).
>> >>> >
>> >>> > Also, it means that, since computations can contain sensitive data,
>> >>> > the
>> >>> > abillity to spread the computation is limited - unless we know the
>> >>> > computation is not user-sensitive, it can only try to use the user's
>> >>> > client(s). This way we also know that the client has no interest in
>> >>> > sabatoging answers to mess with other users (except to exploit
>> >>> server-side
>> >>> > weaknesses, which is inevitable).
>> >>> >
>> >>> > On Sat, Dec 26, 2009 at 10:28 PM, Scott Lawrence <bytbox@xxxxxxxxx>
>> >>> wrote:
>> >>> >
>> >>> >> Here is what, as I envision it, will make modred unique (and hard):
>> >>> >>
>> >>> >>  * Support for clients who can come and leave, lending CPU time and
>> >>> >> using CPU time as they choose.  There are some clusters that
>> >>> >> support
>> >>> >> this, but not very many.
>> >>> >>  * Support for computers participating across the internet.  This
>> >>> >> goes
>> >>> >> along with the previous part, but remember we need security to make
>> >>> >> this worth anything. This also means that user data could
>> potentially
>> >>> >> be passed to untrusted computers - we need a way to prevent this.
>> >>> >>  * The ability for clients to run on any OS, using perl, python,
>> >>> >> java,
>> >>> >> or (on unix systems) C and C++ (servers and the hub will need to
>> >>> >> run
>> >>> >> on linux or at least another unix, or a dedicated OS which we may
>> >>> >> decide to write)
>> >>> >>  * Modred has great ease of use because it acts as a single unified
>> >>> >> computer - a special client program exists that allows one to log
>> in,
>> >>> >> access and edit files, etc...  This is very close to unique -
>> >>> >> google
>> >>> >> has it, though
>> >>> >>
>> >>> >> Because of that last point, many OS design issues should come up
>> when
>> >>> >> we code modred. (I think Freddy pointed this out?) Thus, we have a
>> >>> >> chance to fix flaws in standard unix, incorporating plan 9-type
>> stuff
>> >>> >> (google it and read about it - Plan 9 from Bell Labs, the way the
>> >>> >> future of unix was) while also creating an actually usable user
>> >>> >> interface. (No offense, but to a newbie non-super-technical user,
>> >>> >> linux is a bit harsh...)
>> >>> >>
>> >>> >> Some implementation questions and ideas:
>> >>> >>
>> >>> >>  - how will updates be handled?  Remember we've got 200 computers
>> >>> >> potentially, some of which might be clients that want to
>> >>> >> participate
>> >>> >> in multiple clusters.
>> >>> >>
>> >>> >>  - maybe we should have programs not include front ends.  Instead,
>> >>> >> the
>> >>> >> modred software creates a front-end from the program's self
>> >>> >> description.  This would enforce a consistent user interface if we
>> >>> >> could implement it well
>> >>> >>
>> >>> >>  - how can we keep users from being able to snoop on each others'
>> >>> >> data?
>> >>> >>
>> >>> >> That's just a sample to get people thinking.
>> >>> >>
>> >>> >>
>> >>> >> On 12/26/09, David Tolnay <dtolnay@xxxxxxxxx> wrote:
>> >>> >> > Before diving in to specifics about the implementation I think we
>> >>> >> > need
>> >>> >> > to decide how we want modred to be different from (read: better
>> >>> >> > than)
>> >>> >> > existing bootable cluster environments. Here is a short list to
>> >>> >> > check
>> >>> >> > out:
>> >>> >> >
>> >>> >> > Bootable Cluster CD (http://bccd.net/) - folks presented this at
>> >>> >> > SC09
>> >>> >> > in portland, it was pretty neat stuff. Packed with education /
>> >>> >> > debugging / visualization features
>> >>> >> >
>> >>> >> > Oscar (http://svn.oscar.openclustergroup.org/trac/oscar) - very
>> >>> >> > trivially simple way to transform an existing unix lab into a
>> >>> >> > cluster
>> >>> >> > resource
>> >>> >> >
>> >>> >> > Lnx-bbc (http://www.lnx-bbc.com/) - includes cowsay!
>> >>> >> >
>> >>> >> > Perceus/warewulf (http://www.perceus.org/portal/) - a lot of
>> other
>> >>> >> > sites made reference to this, haven't read too much about it
>> >>> >> >
>> >>> >> > What specifically do you want to improve over any of these?
>> >>> >> >
>> >>> >> >
>> >>> >> > On 12/25/09, Frederic Koehler <fkfire@xxxxxxxxx> wrote:
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> So, as far as I understand this project, the idea is to build
>> >>> >> >> both a client library and a program using the library to do
>> >>> clustering
>> >>> >> >> stuff, along with matching server/hub foo (the library might be
>> >>> >> >> the
>> >>> >> >> same
>> >>> >> >> or
>> >>> >> >> whatever, not important).
>> >>> >> >>
>> >>> >> >> So from this understanding, it seems that the system should
>> >>> >> >> provide
>> >>> >> >> some
>> >>> >> >> basic pseudo-operating system stuff and programs can build on
>> >>> >> >> that,
>> >>> >> >> just
>> >>> >> >> like they would normally build on their local libc/kernel and
>> >>> >> >> stuff.
>> >>> >> >>
>> >>> >> >> So (I sure like the word "so" today...) if we want the type of
>> >>> general
>> >>> >> >> os-like stuff it seems their needs to be support for:
>> >>> >> >>    * A simpe message passing model - abstract away all the
>> >>> >> >> TCP-foo,
>> >>> >>  maybe
>> >>> >> >> use existing foo here (obviously needs fleshing out)
>> >>> >> >>    * Permanent storage IO (clone the unix write(), read(),
>> >>> >> >> open()
>> >>> >> >> and
>> >>> >> >> sync()
>> >>> >> >> model,  or maybe just use one of the existing database-ish nosql
>> >>> things
>> >>> >> >> out
>> >>> >> >> there)
>> >>> >> >>            - Unix-ish model - you create your data hunk, say you
>> >>> >> >> want
>> >>> >> all
>> >>> >> >> this stuff in it, then after sync() we know it's actually
>> >>> >> >> somewhere
>> >>> >> >> written
>> >>> >> >> on a hard-drive, and other things can read it too
>> >>> >> >>            - Unless this isn't in fact needed (but I assume it
>> is)
>> >>> >> >>            - Also need to figure out if it's filesystem-ish foo
>> >>> >> >> (hierarchial) we want or more relational database-ish stuff
>> >>> >> >>
>> >>> >> >>    * A task delegation model - some type of map/reduce-ish stuff
>> >>> >> >>           - Servers have a few built-in computations, and client
>> >>> >> utilizes
>> >>> >> >> them?
>> >>> >> >>           - Or more complex, servers run sandboxed computational
>> >>> code?
>> >>> >> >>    * A security system?
>> >>> >> >>         - Needs fleshing out
>> >>> >> >>         - Presumably what the "hub" manages - it's the trusted
>> >>> >> >> thing
>> >>> >> >>         - Obviously, not everybody is allowed to use the cluster
>> >>> >> >> for
>> >>> >> >> computation, not everybody can find out what everybody else is
>> >>> >> >> doing,
>> >>> >> etc.
>> >>> >> >>       - But also, is their a limit on storage, are some things
>> >>> >> prioritized
>> >>> >> >> over others, ?
>> >>> >> >>
>> >>> >> >> Theroretically, server's are written to provide the io backend
>> and
>> >>> >> >> to
>> >>> >> >> allow
>> >>> >> >> for task delegation, clients use the api, although hub has it's
>> >>> >> >> work
>> >>> >> >> cut
>> >>> >> >> out
>> >>> >> >> delegating all the file io and figuring out what the state of
>> that
>> >>> is.
>> >>> >> >>
>> >>> >> >> On top of some mixture of this, one could build a simple
>> >>> >> >> unix-ish
>> >>> >> >> pseudo-cli, theroretically, as well as real software.
>> >>> >> >>
>> >>> >> >> Anyway, before actually doing anything, people should read about
>> >>> >> >> PVM
>> >>> >> >> (Parallel Virtual Machine) and the like (maybe also Hadoop and
>> >>> >> >> other
>> >>> >> >> foo-ish
>> >>> >> >> stuff) so Modred isn't just a bad clone of it
>> >>> >> >>
>> >>> >> >> Anyway, (yes, twice in a row!), I figured _someone_ had to
>> respond
>> >>> >> >> to
>> >>> >> >> Scott,
>> >>> >> >> otherwise he'd feel all lonely and sad :P Now he can have a warm
>> >>> fuzzy
>> >>> >> >> feeling of deep confusion and uncertainty instead :P
>> >>> >> >>
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> On Fri, Dec 25, 2009 at 11:06 PM, Scott Lawrence
>> >>> >> >> <bytbox@xxxxxxxxx>
>> >>> >> wrote:
>> >>> >> >> > ---------- Forwarded message ----------
>> >>> >> >> > From: Scott Lawrence <bytbox@xxxxxxxxx>
>> >>> >> >> > Date: Fri, 25 Dec 2009 19:20:13 -0500
>> >>> >> >> > Subject: Design Overview
>> >>> >> >> > To: modred <modred@xxxxxxxxxxxxxxxxxxx>
>> >>> >> >> >
>> >>> >> >> > I'm going to assume that everyone understands the basic
>> concepts
>> >>> for
>> >>> >> >> > modred: a set of networked computers (by 'networked' I mean,
>> >>> they're
>> >>> >> >> > all on the internet), divided for the sake of discussion into
>> >>> >> >> > three
>> >>> >> >> > classes: the 'hub' (the dude in charge, who compupters who
>> >>> >> >> > want
>> >>> >> >> > to
>> >>> >> >> > join connect to), the 'servers' (dedicated computers that can
>> be
>> >>> >> >> > pretty much relied on not to go down, although redundancy is
>> >>> >> >> > always
>> >>> >> >> > nice), and the 'clients' (computers that send in requests and
>> >>> >> >> > can
>> >>> be
>> >>> >> >> > used for spare CPU cycles.
>> >>> >> >> >
>> >>> >> >> > Ok, so much for assumptions... :-)
>> >>> >> >> >
>> >>> >> >> > Things *I* think any design should emphasize:
>> >>> >> >> >  * security.
>> >>> >> >> >  * relative ease of use, while retaining significant power.
>> >>> >> >> > Challenging.  In particular, it should be possible to set up a
>> >>> modred
>> >>> >> >> > network in under an hour, provided the computers are already
>> set
>> >>> up.
>> >>> >> >> >  * along with the previous bullet point, having an interface
>> >>> >> >> > that
>> >>> >> >> > lets
>> >>> >> >> > one use the entire network like a single computer.  This is
>> sort
>> >>> >> >> > of
>> >>> >> >> > like the way google docs works, except the cloud is private
>> >>> >> >> >  * therefore, it should be a multi-user system with
>> >>> >> >> > well-designed
>> >>> >> >> > privileges etc...
>> >>> >> >> >
>> >>> >> >> > I'm not going to discuss my implementation ideas, let's hear
>> >>> >> >> > others
>> >>> >> >> > first.
>> >>> >> >> >
>> >>> >> >> > --
>> >>> >> >> > Scott Lawrence
>> >>> >> >> >
>> >>> >> >> > Webmaster
>> >>> >> >> > The Blair Robot Project
>> >>> >> >> > Montgomery Blair High School
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> >
>> >>> >> >> > --
>> >>> >> >> > Scott Lawrence
>> >>> >> >> >
>> >>> >> >> > Webmaster
>> >>> >> >> > The Blair Robot Project
>> >>> >> >> > Montgomery Blair High School
>> >>> >> >> >
>> >>> >> >>
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> _______________________________________________
>> >>> >> >>  Mailing list: https://launchpad.net/~modred
>> >>> >> >>  Post to     : modred@xxxxxxxxxxxxxxxxxxx
>> >>> >> >>  Unsubscribe : https://launchpad.net/~modred
>> >>> >> >>  More help   : https://help.launchpad.net/ListHelp
>> >>> >> >>
>> >>> >> >>
>> >>> >> >
>> >>> >> > _______________________________________________
>> >>> >> > Mailing list: https://launchpad.net/~modred
>> >>> >> > Post to     : modred@xxxxxxxxxxxxxxxxxxx
>> >>> >> > Unsubscribe : https://launchpad.net/~modred
>> >>> >> > More help   : https://help.launchpad.net/ListHelp
>> >>> >> >
>> >>> >>
>> >>> >>
>> >>> >> --
>> >>> >> Scott Lawrence
>> >>> >>
>> >>> >> Webmaster
>> >>> >> The Blair Robot Project
>> >>> >> Montgomery Blair High School
>> >>> >>
>> >>> >> _______________________________________________
>> >>> >> Mailing list: https://launchpad.net/~modred
>> >>> >> Post to     : modred@xxxxxxxxxxxxxxxxxxx
>> >>> >> Unsubscribe : https://launchpad.net/~modred
>> >>> >> More help   : https://help.launchpad.net/ListHelp
>> >>> >>
>> >>> >
>> >>>
>> >>>
>> >>> --
>> >>> Scott Lawrence
>> >>>
>> >>> Webmaster
>> >>> The Blair Robot Project
>> >>> Montgomery Blair High School
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Scott Lawrence
>> >>
>> >> Webmaster
>> >> The Blair Robot Project
>> >> Montgomery Blair High School
>> >>
>> >
>> >
>> > --
>> > Scott Lawrence
>> >
>> > Webmaster
>> > The Blair Robot Project
>> > Montgomery Blair High School
>> >
>>
>>
>> --
>> Scott Lawrence
>>
>> Webmaster
>> The Blair Robot Project
>> Montgomery Blair High School
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~modred
>> Post to     : modred@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~modred
>> More help   : https://help.launchpad.net/ListHelp
>>
>


-- 
Scott Lawrence

Webmaster
The Blair Robot Project
Montgomery Blair High School
Follow ups

Re: Concept stuff
From: Frederic Koehler, 2009-12-29
References

Concept stuff
From: Frederic Koehler, 2009-12-26
Re: Concept stuff
From: David Tolnay, 2009-12-27
Re: Concept stuff
From: Scott Lawrence, 2009-12-27
Re: Concept stuff
From: Frederic Koehler, 2009-12-28
Re: Concept stuff
From: Scott Lawrence, 2009-12-28
Fwd: Concept stuff
From: Scott Lawrence, 2009-12-28
Re: Concept stuff
From: Scott Lawrence, 2009-12-28
Re: Concept stuff
From: Scott Lawrence, 2009-12-28
Re: Concept stuff
From: Frederic Koehler, 2009-12-28