← Back to team overview

modred team mailing list archive

Re: Concept stuff

 

On Mon, Dec 28, 2009 at 12:45 AM, Scott Lawrence <bytbox@xxxxxxxxx> wrote:

> "What happens when a client disconnects with unfinished work? Is the
> work immeditately reassigned, or does the server wait for a specified
> period, etc. This could come up quite a lot because some clients will
> just disconnect as soon as work they submitted is completed."
>
> Ouch.  Good question.  Here's one solution: small tasks (expected time
> <2 seconds) are always assigned to two or more clients/servers.  If
> both disconnect, reassign, if one disconnects, use the other guy's
> answer. Large tasks, if a computer stops regularly checking in every 5
> or so seconds, give that computer's results to date to another
> computer.  So yeah, I think a client should have to make regular
> reports to a server.
>
> Here's another problem: how do we tell how many CPU time credits to
> grant a client?  We can't always tell how long a problem should take
> beforehand.
>
> Here's another problem: which computer should handle the clients?  As
> I've been thinking about this, there are three types of computers, the
> single hub, the various dedicated servers (capable of storing
> permanent data), and the clients.  (The hub is necessary - without it,
> the performance of the cluster drastically decreases.) So clients
> connect to the hub, and then the hub directs all computers.  But the
> hub will get overloaded if 100 computers are checking in every 10
> seconds to give it more data (and then the hub has to pass this on to
> other servers for storage, etc...).  So at some point, the hub needs
> to tell the client to talk to the server.  When?
>
> Who wants to create that prototype?
>
> On 12/28/09, Scott Lawrence <bytbox@xxxxxxxxx> wrote:
> > This is where we start building prototypes. However, just to keep the
> > theoretical side going: I disagree about the privacy issue.  Most
> > operations that would benefit from the CPU time of a cluster (notice
> > I'm not talking about the data storage and reliability benefits, which
> > aren't affected by the presence of clients) are not very private.
> > Rendering nice screensavers ("Electric Sheep", I think that one's
> > called), and hefty data sifting aren't private - who cares about the
> > screensavers, and the data is generally public anyway (of course if it
> > wasn't, it would be marked so).
> >
> > Ray tracing and simulation could be more of an issue.  Hypothetical
> > situation: Alice is simulating how wind will affect her proprietary
> > airplane design.  Naturally, she can't hand off the whole design, or
> > even parts of the design, to random client computers.  This is where
> > the windows programmer says, "so the client computers can't help
> > Alice."  But that's not true - as a bad example, what if the Modred
> > hub gave to a client computer 80 types of landing gear, and told the
> > computer, not to simulate something, but to solve a general formula
> > that could later be used in the computation in a trivial and quick
> > way?  If that client is evil, it will learn that Alice's airplane has
> > some sort of landing gear.
> >
> > Somebody needs to create a prototype of a server that can create
> > arbitrary problems in some format, so we can all try to trick it.  I
> > suggest lisp as the language, but it's up to the implementer.
> >
> >
> > On 12/28/09, Scott Lawrence <bytbox@xxxxxxxxx> wrote:
> >> ---------- Forwarded message ----------
> >> From: Frederic Koehler <fkfire@xxxxxxxxx>
> >> Date: Sun, 27 Dec 2009 23:21:28 -0500
> >> Subject: Re: [Modred] Concept stuff
> >> To: Scott Lawrence <bytbox@xxxxxxxxx>
> >>
> >>  * This is sort-of a solution (while obviously less-than-optimal
> >> security,
> >> some grid-computing stuff does this, like BOINC), however, it turns out
> >> that
> >> this may require custom validation methods - for example, it's normal
> for
> >> floating point values to be different on different computers, and the
> >> same
> >> could apply for other computations.
> >>  * A malicious client would only need to misbehave on certain problems
> >> that
> >> a malicious user could designate (or recognize obvious fake programs),
> >> allowing the fake program test to work.
> >>     - A better idea would be to randomly reduplicate some computations
> >> many
> >> times - the malicious client wouldn't notice anything, but could easily
> >> be
> >> singled out
> >>
> >>   * Thirdly is mostly the same thing I wrote before - only computations
> >> that
> >> are said to be totally unimportant privacy wise could benefit from
> >> client-side computing.
> >>
> >> So I really think that client-side computation is only a good idea for a
> >> small subset of problems (like the type that there already exist massive
> >> grid computing solutions for, like SETI@HOME)
> >>
> >> On Sun, Dec 27, 2009 at 10:57 PM, Scott Lawrence <bytbox@xxxxxxxxx>
> >> wrote:
> >>
> >>> I want clients to be used for computation, and I want maximum
> >>> privacy+security given that restriction.  Some ideas:
> >>>
> >>> With a large network, two computers can perform the same computation.
> >>> Furthermore, a smart modred hub can give fake problems to clients,
> >>> just to make sure that they're operating correctly.  A client that
> >>> isn't operating correctly gets cut. (No second chances! A program
> >>> could exploit that!)
> >>>
> >>> If a user specifies a certain bit of data (SSN, for instance) as
> >>> highly sensitive, modred should know not to hand off that computation
> >>> to a client. (If it does by accident, it certainly should never hand
> >>> off the data.) privacy++
> >>>
> >>> In all cases, computations should be anonymous. privacy++
> >>>
> >>> Other ideas?
> >>>
> >>>
> >>> On 12/27/09, Frederic Koehler <fkfire@xxxxxxxxx> wrote:
> >>> > The idea for client-side computation implies that we have
> >>> > highly-trusted
> >>> > clients... (we know they won't provide invalid answers) Otherwise,
> >>> > client-side computation requires verifying answers and so is only
> >>> > useful
> >>> for
> >>> > a few NP-ish problems. In addition (assuming trusted clients).
> >>> >
> >>> > Also, it means that, since computations can contain sensitive data,
> >>> > the
> >>> > abillity to spread the computation is limited - unless we know the
> >>> > computation is not user-sensitive, it can only try to use the user's
> >>> > client(s). This way we also know that the client has no interest in
> >>> > sabatoging answers to mess with other users (except to exploit
> >>> server-side
> >>> > weaknesses, which is inevitable).
> >>> >
> >>> > On Sat, Dec 26, 2009 at 10:28 PM, Scott Lawrence <bytbox@xxxxxxxxx>
> >>> wrote:
> >>> >
> >>> >> Here is what, as I envision it, will make modred unique (and hard):
> >>> >>
> >>> >>  * Support for clients who can come and leave, lending CPU time and
> >>> >> using CPU time as they choose.  There are some clusters that support
> >>> >> this, but not very many.
> >>> >>  * Support for computers participating across the internet.  This
> >>> >> goes
> >>> >> along with the previous part, but remember we need security to make
> >>> >> this worth anything. This also means that user data could
> potentially
> >>> >> be passed to untrusted computers - we need a way to prevent this.
> >>> >>  * The ability for clients to run on any OS, using perl, python,
> >>> >> java,
> >>> >> or (on unix systems) C and C++ (servers and the hub will need to run
> >>> >> on linux or at least another unix, or a dedicated OS which we may
> >>> >> decide to write)
> >>> >>  * Modred has great ease of use because it acts as a single unified
> >>> >> computer - a special client program exists that allows one to log
> in,
> >>> >> access and edit files, etc...  This is very close to unique - google
> >>> >> has it, though
> >>> >>
> >>> >> Because of that last point, many OS design issues should come up
> when
> >>> >> we code modred. (I think Freddy pointed this out?) Thus, we have a
> >>> >> chance to fix flaws in standard unix, incorporating plan 9-type
> stuff
> >>> >> (google it and read about it - Plan 9 from Bell Labs, the way the
> >>> >> future of unix was) while also creating an actually usable user
> >>> >> interface. (No offense, but to a newbie non-super-technical user,
> >>> >> linux is a bit harsh...)
> >>> >>
> >>> >> Some implementation questions and ideas:
> >>> >>
> >>> >>  - how will updates be handled?  Remember we've got 200 computers
> >>> >> potentially, some of which might be clients that want to participate
> >>> >> in multiple clusters.
> >>> >>
> >>> >>  - maybe we should have programs not include front ends.  Instead,
> >>> >> the
> >>> >> modred software creates a front-end from the program's self
> >>> >> description.  This would enforce a consistent user interface if we
> >>> >> could implement it well
> >>> >>
> >>> >>  - how can we keep users from being able to snoop on each others'
> >>> >> data?
> >>> >>
> >>> >> That's just a sample to get people thinking.
> >>> >>
> >>> >>
> >>> >> On 12/26/09, David Tolnay <dtolnay@xxxxxxxxx> wrote:
> >>> >> > Before diving in to specifics about the implementation I think we
> >>> >> > need
> >>> >> > to decide how we want modred to be different from (read: better
> >>> >> > than)
> >>> >> > existing bootable cluster environments. Here is a short list to
> >>> >> > check
> >>> >> > out:
> >>> >> >
> >>> >> > Bootable Cluster CD (http://bccd.net/) - folks presented this at
> >>> >> > SC09
> >>> >> > in portland, it was pretty neat stuff. Packed with education /
> >>> >> > debugging / visualization features
> >>> >> >
> >>> >> > Oscar (http://svn.oscar.openclustergroup.org/trac/oscar) - very
> >>> >> > trivially simple way to transform an existing unix lab into a
> >>> >> > cluster
> >>> >> > resource
> >>> >> >
> >>> >> > Lnx-bbc (http://www.lnx-bbc.com/) - includes cowsay!
> >>> >> >
> >>> >> > Perceus/warewulf (http://www.perceus.org/portal/) - a lot of
> other
> >>> >> > sites made reference to this, haven't read too much about it
> >>> >> >
> >>> >> > What specifically do you want to improve over any of these?
> >>> >> >
> >>> >> >
> >>> >> > On 12/25/09, Frederic Koehler <fkfire@xxxxxxxxx> wrote:
> >>> >> >>
> >>> >> >>
> >>> >> >> So, as far as I understand this project, the idea is to build
> >>> >> >> both a client library and a program using the library to do
> >>> clustering
> >>> >> >> stuff, along with matching server/hub foo (the library might be
> >>> >> >> the
> >>> >> >> same
> >>> >> >> or
> >>> >> >> whatever, not important).
> >>> >> >>
> >>> >> >> So from this understanding, it seems that the system should
> >>> >> >> provide
> >>> >> >> some
> >>> >> >> basic pseudo-operating system stuff and programs can build on
> >>> >> >> that,
> >>> >> >> just
> >>> >> >> like they would normally build on their local libc/kernel and
> >>> >> >> stuff.
> >>> >> >>
> >>> >> >> So (I sure like the word "so" today...) if we want the type of
> >>> general
> >>> >> >> os-like stuff it seems their needs to be support for:
> >>> >> >>    * A simpe message passing model - abstract away all the
> >>> >> >> TCP-foo,
> >>> >>  maybe
> >>> >> >> use existing foo here (obviously needs fleshing out)
> >>> >> >>    * Permanent storage IO (clone the unix write(), read(), open()
> >>> >> >> and
> >>> >> >> sync()
> >>> >> >> model,  or maybe just use one of the existing database-ish nosql
> >>> things
> >>> >> >> out
> >>> >> >> there)
> >>> >> >>            - Unix-ish model - you create your data hunk, say you
> >>> >> >> want
> >>> >> all
> >>> >> >> this stuff in it, then after sync() we know it's actually
> >>> >> >> somewhere
> >>> >> >> written
> >>> >> >> on a hard-drive, and other things can read it too
> >>> >> >>            - Unless this isn't in fact needed (but I assume it
> is)
> >>> >> >>            - Also need to figure out if it's filesystem-ish foo
> >>> >> >> (hierarchial) we want or more relational database-ish stuff
> >>> >> >>
> >>> >> >>    * A task delegation model - some type of map/reduce-ish stuff
> >>> >> >>           - Servers have a few built-in computations, and client
> >>> >> utilizes
> >>> >> >> them?
> >>> >> >>           - Or more complex, servers run sandboxed computational
> >>> code?
> >>> >> >>    * A security system?
> >>> >> >>         - Needs fleshing out
> >>> >> >>         - Presumably what the "hub" manages - it's the trusted
> >>> >> >> thing
> >>> >> >>         - Obviously, not everybody is allowed to use the cluster
> >>> >> >> for
> >>> >> >> computation, not everybody can find out what everybody else is
> >>> >> >> doing,
> >>> >> etc.
> >>> >> >>       - But also, is their a limit on storage, are some things
> >>> >> prioritized
> >>> >> >> over others, ?
> >>> >> >>
> >>> >> >> Theroretically, server's are written to provide the io backend
> and
> >>> >> >> to
> >>> >> >> allow
> >>> >> >> for task delegation, clients use the api, although hub has it's
> >>> >> >> work
> >>> >> >> cut
> >>> >> >> out
> >>> >> >> delegating all the file io and figuring out what the state of
> that
> >>> is.
> >>> >> >>
> >>> >> >> On top of some mixture of this, one could build a simple unix-ish
> >>> >> >> pseudo-cli, theroretically, as well as real software.
> >>> >> >>
> >>> >> >> Anyway, before actually doing anything, people should read about
> >>> >> >> PVM
> >>> >> >> (Parallel Virtual Machine) and the like (maybe also Hadoop and
> >>> >> >> other
> >>> >> >> foo-ish
> >>> >> >> stuff) so Modred isn't just a bad clone of it
> >>> >> >>
> >>> >> >> Anyway, (yes, twice in a row!), I figured _someone_ had to
> respond
> >>> >> >> to
> >>> >> >> Scott,
> >>> >> >> otherwise he'd feel all lonely and sad :P Now he can have a warm
> >>> fuzzy
> >>> >> >> feeling of deep confusion and uncertainty instead :P
> >>> >> >>
> >>> >> >>
> >>> >> >>
> >>> >> >> On Fri, Dec 25, 2009 at 11:06 PM, Scott Lawrence
> >>> >> >> <bytbox@xxxxxxxxx>
> >>> >> wrote:
> >>> >> >> > ---------- Forwarded message ----------
> >>> >> >> > From: Scott Lawrence <bytbox@xxxxxxxxx>
> >>> >> >> > Date: Fri, 25 Dec 2009 19:20:13 -0500
> >>> >> >> > Subject: Design Overview
> >>> >> >> > To: modred <modred@xxxxxxxxxxxxxxxxxxx>
> >>> >> >> >
> >>> >> >> > I'm going to assume that everyone understands the basic
> concepts
> >>> for
> >>> >> >> > modred: a set of networked computers (by 'networked' I mean,
> >>> they're
> >>> >> >> > all on the internet), divided for the sake of discussion into
> >>> >> >> > three
> >>> >> >> > classes: the 'hub' (the dude in charge, who compupters who want
> >>> >> >> > to
> >>> >> >> > join connect to), the 'servers' (dedicated computers that can
> be
> >>> >> >> > pretty much relied on not to go down, although redundancy is
> >>> >> >> > always
> >>> >> >> > nice), and the 'clients' (computers that send in requests and
> >>> >> >> > can
> >>> be
> >>> >> >> > used for spare CPU cycles.
> >>> >> >> >
> >>> >> >> > Ok, so much for assumptions... :-)
> >>> >> >> >
> >>> >> >> > Things *I* think any design should emphasize:
> >>> >> >> >  * security.
> >>> >> >> >  * relative ease of use, while retaining significant power.
> >>> >> >> > Challenging.  In particular, it should be possible to set up a
> >>> modred
> >>> >> >> > network in under an hour, provided the computers are already
> set
> >>> up.
> >>> >> >> >  * along with the previous bullet point, having an interface
> >>> >> >> > that
> >>> >> >> > lets
> >>> >> >> > one use the entire network like a single computer.  This is
> sort
> >>> >> >> > of
> >>> >> >> > like the way google docs works, except the cloud is private
> >>> >> >> >  * therefore, it should be a multi-user system with
> >>> >> >> > well-designed
> >>> >> >> > privileges etc...
> >>> >> >> >
> >>> >> >> > I'm not going to discuss my implementation ideas, let's hear
> >>> >> >> > others
> >>> >> >> > first.
> >>> >> >> >
> >>> >> >> > --
> >>> >> >> > Scott Lawrence
> >>> >> >> >
> >>> >> >> > Webmaster
> >>> >> >> > The Blair Robot Project
> >>> >> >> > Montgomery Blair High School
> >>> >> >> >
> >>> >> >> >
> >>> >> >> >
> >>> >> >> > --
> >>> >> >> > Scott Lawrence
> >>> >> >> >
> >>> >> >> > Webmaster
> >>> >> >> > The Blair Robot Project
> >>> >> >> > Montgomery Blair High School
> >>> >> >> >
> >>> >> >>
> >>> >> >>
> >>> >> >>
> >>> >> >> _______________________________________________
> >>> >> >>  Mailing list: https://launchpad.net/~modred
> >>> >> >>  Post to     : modred@xxxxxxxxxxxxxxxxxxx
> >>> >> >>  Unsubscribe : https://launchpad.net/~modred
> >>> >> >>  More help   : https://help.launchpad.net/ListHelp
> >>> >> >>
> >>> >> >>
> >>> >> >
> >>> >> > _______________________________________________
> >>> >> > Mailing list: https://launchpad.net/~modred
> >>> >> > Post to     : modred@xxxxxxxxxxxxxxxxxxx
> >>> >> > Unsubscribe : https://launchpad.net/~modred
> >>> >> > More help   : https://help.launchpad.net/ListHelp
> >>> >> >
> >>> >>
> >>> >>
> >>> >> --
> >>> >> Scott Lawrence
> >>> >>
> >>> >> Webmaster
> >>> >> The Blair Robot Project
> >>> >> Montgomery Blair High School
> >>> >>
> >>> >> _______________________________________________
> >>> >> Mailing list: https://launchpad.net/~modred
> >>> >> Post to     : modred@xxxxxxxxxxxxxxxxxxx
> >>> >> Unsubscribe : https://launchpad.net/~modred
> >>> >> More help   : https://help.launchpad.net/ListHelp
> >>> >>
> >>> >
> >>>
> >>>
> >>> --
> >>> Scott Lawrence
> >>>
> >>> Webmaster
> >>> The Blair Robot Project
> >>> Montgomery Blair High School
> >>>
> >>
> >>
> >>
> >> --
> >> Scott Lawrence
> >>
> >> Webmaster
> >> The Blair Robot Project
> >> Montgomery Blair High School
> >>
> >
> >
> > --
> > Scott Lawrence
> >
> > Webmaster
> > The Blair Robot Project
> > Montgomery Blair High School
> >
>
>
> --
> Scott Lawrence
>
> Webmaster
> The Blair Robot Project
> Montgomery Blair High School
>
> _______________________________________________
> Mailing list: https://launchpad.net/~modred
> Post to     : modred@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~modred
> More help   : https://help.launchpad.net/ListHelp
>

Follow ups

References