modred team mailing list archive

Thread
Date

Re: Fwd: Concept stuff

To: modred@xxxxxxxxxxxxxxxxxxx
From: Michael Cohen <gnurdux@xxxxxxxxx>
Date: Mon, 28 Dec 2009 22:51:59 -0500
In-reply-to: <53a52e1f0912281946v454dbd0cpc86220001beff43b@mail.gmail.com>
User-agent: Mozilla-Thunderbird 2.0.0.19 (X11/20090103)

The idea here is that if people are smart, stuff should basicallyacquire a "market value." Credits would be basically like money. Theonly issue is how credits originate, since someone needs to get thembefore people spend them. But you could probably make some "official"projects that you will automatically get credits at a certain rate forparticipating in.


Michael Cohen

Scott Lawrence wrote:

Oh, that's interesting.  It might actually work... Other comments?


On 12/28/09, Michael Cohen <gnurdux@xxxxxxxxx> wrote:

Wrong.  You need to have credits to pay people with.  And so you have to
spend more credits if you want to pay people more.

Michael Cohen

Scott Lawrence wrote:

No.  That leads to people trying to make their projects as valuable as
possible, and within 3 days, the whole system will be worthless.

Please remember that the mailing lists at launchpad don't perform
reply-to mangling. Instruct your client accordingly.

On 12/28/09, Michael Cohen <gnurdux@xxxxxxxxx> wrote:

I would actually award credits on a per-project basis.  Each project
simply chooses how many credits to award for each task.  Your computer
is offered so many credits for so much work; if it finishes it gets the
creds and otherwise not.  Trying to do it based on time is dumb because
we don't care about time, we care about computational power contributed.
  If someone gives me 5000 hours, but in a VM limited to run at .1% of
their Pentium 3 CPU, then that isn't worth as much to me as 5 hours on
someone's brand new quad core powerhouse.

Michael Cohen

Scott Lawrence wrote:

Some administrators will consider CPU time credits to be very
important, though. Especially if they want them to be
buyable/exchangable for that stink green stuff.

Let's deal with this later - we could just validate the same way we do
right answer/wrong answer validation.




On 12/28/09, Frederic Koehler <fkfire@xxxxxxxxx> wrote:

---------- Forwarded message ----------
From: Frederic Koehler <fkfire@xxxxxxxxx>
Date: Mon, Dec 28, 2009 at 10:17 PM
Subject: Re: [Modred] Concept stuff
To: Scott Lawrence <bytbox@xxxxxxxxx>

Network capacity can be protected by batching responses. Normal clients
can
save several computations and send one big response (like when exiting)
-
this will have similar bandwidth to big computations but avoid the
potential
for a really long computation wasting time on a bunch of computers
without
being solved.

On Mon, Dec 28, 2009 at 10:14 PM, Scott Lawrence <bytbox@xxxxxxxxx>
wrote:

* Maybe. But that still makes it too easy to gain false credits.
* Yeah, I'm agreeing with this.  The hub delegates to the servers, and
the servers delegate to clients.
* This will overload network capacity, and I don't like it.  We should
be able to trust clients to make long computations (long=over 30
seconds).  Maybe clients could give servers hints on how long they'll
be on?

On 12/28/09, Frederic Koehler <fkfire@xxxxxxxxx> wrote:

Hah, my email died, wow...Wonder what happened to it....
Anyway, this is something like what I wrote before:

* CPU time credits can be very roughly estimated by averaging
response

time.

It's not all _that_ important anyway if nothing is a behemoth task.

* The hub can immediately, upon establishing connection, redirect
client

to

a server. The server will still have to communicate with hub
somewhat,

but

it can only send stuff necessarily pertaining to the hub.

* Jobs should probably be many small tasks to avoid the risk of
losing
a
giant computation (since saving computation state is not
easy/generalizable). Beyond that, sending keep-alive packets is
enough
to
know when a client dies.


On Mon, Dec 28, 2009 at 6:16 PM, Scott Lawrence <bytbox@xxxxxxxxx>

wrote:

What?

On 12/28/09, Frederic Koehler <fkfire@xxxxxxxxx> wrote:

On Mon, Dec 28, 2009 at 12:45 AM, Scott Lawrence <bytbox@xxxxxxxxx>

wrote:

"What happens when a client disconnects with unfinished work? Is
the
work immeditately reassigned, or does the server wait for a
specified
period, etc. This could come up quite a lot because some clients
will
just disconnect as soon as work they submitted is completed."

Ouch.  Good question.  Here's one solution: small tasks (expected

time

<2 seconds) are always assigned to two or more clients/servers.
If
both disconnect, reassign, if one disconnects, use the other guy's
answer. Large tasks, if a computer stops regularly checking in
every

or so seconds, give that computer's results to date to another
computer.  So yeah, I think a client should have to make regular
reports to a server.

Here's another problem: how do we tell how many CPU time credits
to
grant a client?  We can't always tell how long a problem should
take
beforehand.

Here's another problem: which computer should handle the clients?
As
I've been thinking about this, there are three types of computers,

the

single hub, the various dedicated servers (capable of storing
permanent data), and the clients.  (The hub is necessary - without

it,

the performance of the cluster drastically decreases.) So clients
connect to the hub, and then the hub directs all computers.  But
the
hub will get overloaded if 100 computers are checking in every 10
seconds to give it more data (and then the hub has to pass this on
to
other servers for storage, etc...).  So at some point, the hub
needs
to tell the client to talk to the server.  When?

Who wants to create that prototype?

On 12/28/09, Scott Lawrence <bytbox@xxxxxxxxx> wrote:

This is where we start building prototypes. However, just to keep

the

theoretical side going: I disagree about the privacy issue.  Most
operations that would benefit from the CPU time of a cluster

(notice

I'm not talking about the data storage and reliability benefits,
which
aren't affected by the presence of clients) are not very private.
Rendering nice screensavers ("Electric Sheep", I think that one's
called), and hefty data sifting aren't private - who cares about

the

screensavers, and the data is generally public anyway (of course
if
it
wasn't, it would be marked so).

Ray tracing and simulation could be more of an issue.
Hypothetical
situation: Alice is simulating how wind will affect her
proprietary
airplane design.  Naturally, she can't hand off the whole design,

or

even parts of the design, to random client computers.  This is

where

the windows programmer says, "so the client computers can't help
Alice."  But that's not true - as a bad example, what if the
Modred
hub gave to a client computer 80 types of landing gear, and told

the

computer, not to simulate something, but to solve a general
formula
that could later be used in the computation in a trivial and
quick
way?  If that client is evil, it will learn that Alice's airplane

has

some sort of landing gear.

Somebody needs to create a prototype of a server that can create
arbitrary problems in some format, so we can all try to trick it.

suggest lisp as the language, but it's up to the implementer.


On 12/28/09, Scott Lawrence <bytbox@xxxxxxxxx> wrote:

---------- Forwarded message ----------
From: Frederic Koehler <fkfire@xxxxxxxxx>
Date: Sun, 27 Dec 2009 23:21:28 -0500
Subject: Re: [Modred] Concept stuff
To: Scott Lawrence <bytbox@xxxxxxxxx>

 * This is sort-of a solution (while obviously less-than-optimal
security,
some grid-computing stuff does this, like BOINC), however, it

turns

out

that
this may require custom validation methods - for example, it's
normal

for

floating point values to be different on different computers,
and
the
same
could apply for other computations.
 * A malicious client would only need to misbehave on certain

problems

that
a malicious user could designate (or recognize obvious fake

programs),

allowing the fake program test to work.
    - A better idea would be to randomly reduplicate some

computations

many
times - the malicious client wouldn't notice anything, but could

easily

be
singled out

  * Thirdly is mostly the same thing I wrote before - only

computations

that
are said to be totally unimportant privacy wise could benefit
from
client-side computing.

So I really think that client-side computation is only a good
idea

for

a
small subset of problems (like the type that there already exist
massive
grid computing solutions for, like SETI@HOME)

On Sun, Dec 27, 2009 at 10:57 PM, Scott Lawrence <

bytbox@xxxxxxxxx>

wrote:

I want clients to be used for computation, and I want maximum
privacy+security given that restriction.  Some ideas:

With a large network, two computers can perform the same

computation.

Furthermore, a smart modred hub can give fake problems to

clients,

just to make sure that they're operating correctly.  A client

that

isn't operating correctly gets cut. (No second chances! A
program
could exploit that!)

If a user specifies a certain bit of data (SSN, for instance)
as
highly sensitive, modred should know not to hand off that

computation

to a client. (If it does by accident, it certainly should never
hand
off the data.) privacy++

In all cases, computations should be anonymous. privacy++

Other ideas?


On 12/27/09, Frederic Koehler <fkfire@xxxxxxxxx> wrote:

The idea for client-side computation implies that we have
highly-trusted
clients... (we know they won't provide invalid answers)
Otherwise,
client-side computation requires verifying answers and so is

only

useful

for

a few NP-ish problems. In addition (assuming trusted clients).

Also, it means that, since computations can contain sensitive

data,

the
abillity to spread the computation is limited - unless we know
the
computation is not user-sensitive, it can only try to use the

user's

client(s). This way we also know that the client has no

interest

in

sabatoging answers to mess with other users (except to exploit

server-side

weaknesses, which is inevitable).

On Sat, Dec 26, 2009 at 10:28 PM, Scott Lawrence <

bytbox@xxxxxxxxx>

wrote:

Here is what, as I envision it, will make modred unique (and

hard):

 * Support for clients who can come and leave, lending CPU

time

and

using CPU time as they choose.  There are some clusters that
support
this, but not very many.
 * Support for computers participating across the internet.
This
goes
along with the previous part, but remember we need security
to

make

this worth anything. This also means that user data could

potentially

be passed to untrusted computers - we need a way to prevent
this.
 * The ability for clients to run on any OS, using perl,

python,

java,
or (on unix systems) C and C++ (servers and the hub will need

to

run
on linux or at least another unix, or a dedicated OS which we
may
decide to write)
 * Modred has great ease of use because it acts as a single

unified

computer - a special client program exists that allows one to
log

in,

access and edit files, etc...  This is very close to unique -
google
has it, though

Because of that last point, many OS design issues should come

up

when

we code modred. (I think Freddy pointed this out?) Thus, we

have

chance to fix flaws in standard unix, incorporating plan

9-type

stuff

(google it and read about it - Plan 9 from Bell Labs, the way
the
future of unix was) while also creating an actually usable

user

interface. (No offense, but to a newbie non-super-technical
user,
linux is a bit harsh...)

Some implementation questions and ideas:

 - how will updates be handled?  Remember we've got 200
computers
potentially, some of which might be clients that want to
participate
in multiple clusters.

 - maybe we should have programs not include front ends.

 Instead,

the
modred software creates a front-end from the program's self
description.  This would enforce a consistent user interface

if

we

could implement it well

 - how can we keep users from being able to snoop on each
others'
data?

That's just a sample to get people thinking.


On 12/26/09, David Tolnay <dtolnay@xxxxxxxxx> wrote:

Before diving in to specifics about the implementation I

think

we

need
to decide how we want modred to be different from (read:
better
than)
existing bootable cluster environments. Here is a short
list
to
check
out:

Bootable Cluster CD (http://bccd.net/) - folks presented

this

at

SC09
in portland, it was pretty neat stuff. Packed with
education

debugging / visualization features

Oscar (http://svn.oscar.openclustergroup.org/trac/oscar) -

very

trivially simple way to transform an existing unix lab into

cluster
resource

Lnx-bbc (http://www.lnx-bbc.com/) - includes cowsay!

Perceus/warewulf (http://www.perceus.org/portal/) - a lot

of

other

sites made reference to this, haven't read too much about
it

What specifically do you want to improve over any of these?


On 12/25/09, Frederic Koehler <fkfire@xxxxxxxxx> wrote:

So, as far as I understand this project, the idea is to

build

both a client library and a program using the library to
do

clustering

stuff, along with matching server/hub foo (the library

might

be

the
same
or
whatever, not important).

So from this understanding, it seems that the system
should
provide
some
basic pseudo-operating system stuff and programs can build

on

that,
just
like they would normally build on their local libc/kernel

and

stuff.

So (I sure like the word "so" today...) if we want the
type
of

general

os-like stuff it seems their needs to be support for:
   * A simpe message passing model - abstract away all the
TCP-foo,

 maybe

use existing foo here (obviously needs fleshing out)
   * Permanent storage IO (clone the unix write(), read(),
open()
and
sync()
model,  or maybe just use one of the existing database-ish

nosql

things

out
there)
           - Unix-ish model - you create your data hunk,

say

you

want

all

this stuff in it, then after sync() we know it's actually
somewhere
written
on a hard-drive, and other things can read it too
           - Unless this isn't in fact needed (but I
assume
it

is)

           - Also need to figure out if it's
filesystem-ish

foo

(hierarchial) we want or more relational database-ish
stuff

   * A task delegation model - some type of map/reduce-ish

stuff

          - Servers have a few built-in computations, and

client

utilizes

them?
          - Or more complex, servers run sandboxed

computational

code?

   * A security system?
        - Needs fleshing out
        - Presumably what the "hub" manages - it's the
trusted
thing
        - Obviously, not everybody is allowed to use the

cluster

for
computation, not everybody can find out what everybody
else
is
doing,

etc.

      - But also, is their a limit on storage, are some
things

prioritized

over others, ?

Theroretically, server's are written to provide the io
backend

and

to
allow
for task delegation, clients use the api, although hub has

it's

work
cut
out
delegating all the file io and figuring out what the state

of

that

is.

On top of some mixture of this, one could build a simple
unix-ish
pseudo-cli, theroretically, as well as real software.

Anyway, before actually doing anything, people should read

about

PVM
(Parallel Virtual Machine) and the like (maybe also Hadoop
and
other
foo-ish
stuff) so Modred isn't just a bad clone of it

Anyway, (yes, twice in a row!), I figured _someone_ had to

respond

to
Scott,
otherwise he'd feel all lonely and sad :P Now he can have
a

warm

fuzzy

feeling of deep confusion and uncertainty instead :P



On Fri, Dec 25, 2009 at 11:06 PM, Scott Lawrence
<bytbox@xxxxxxxxx>

wrote:

---------- Forwarded message ----------
From: Scott Lawrence <bytbox@xxxxxxxxx>
Date: Fri, 25 Dec 2009 19:20:13 -0500
Subject: Design Overview
To: modred <modred@xxxxxxxxxxxxxxxxxxx>

I'm going to assume that everyone understands the basic

concepts

for

modred: a set of networked computers (by 'networked' I
mean,

they're

all on the internet), divided for the sake of discussion

into

three
classes: the 'hub' (the dude in charge, who compupters

who

want
to
join connect to), the 'servers' (dedicated computers
that

can

be

pretty much relied on not to go down, although
redundancy
is
always
nice), and the 'clients' (computers that send in
requests

and

can

be

used for spare CPU cycles.

Ok, so much for assumptions... :-)

Things *I* think any design should emphasize:
 * security.
 * relative ease of use, while retaining significant

power.

Challenging.  In particular, it should be possible to
set
up

modred

network in under an hour, provided the computers are
already

set

up.

 * along with the previous bullet point, having an
interface
that
lets
one use the entire network like a single computer.  This

is

sort

of
like the way google docs works, except the cloud is

private

 * therefore, it should be a multi-user system with
well-designed
privileges etc...

I'm not going to discuss my implementation ideas, let's
hear
others
first.

--
Scott Lawrence

Webmaster
The Blair Robot Project
Montgomery Blair High School



--
Scott Lawrence

Webmaster
The Blair Robot Project
Montgomery Blair High School

_______________________________________________
 Mailing list: https://launchpad.net/~modred
 Post to     : modred@xxxxxxxxxxxxxxxxxxx
 Unsubscribe : https://launchpad.net/~modred
 More help   : https://help.launchpad.net/ListHelp

_______________________________________________
Mailing list: https://launchpad.net/~modred
Post to     : modred@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~modred
More help   : https://help.launchpad.net/ListHelp

--
Scott Lawrence

Webmaster
The Blair Robot Project
Montgomery Blair High School

_______________________________________________
Mailing list: https://launchpad.net/~modred
Post to     : modred@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~modred
More help   : https://help.launchpad.net/ListHelp

--
Scott Lawrence

Webmaster
The Blair Robot Project
Montgomery Blair High School

--
Scott Lawrence

Webmaster
The Blair Robot Project
Montgomery Blair High School

--
Scott Lawrence

Webmaster
The Blair Robot Project
Montgomery Blair High School

--
Scott Lawrence

Webmaster
The Blair Robot Project
Montgomery Blair High School

_______________________________________________
Mailing list: https://launchpad.net/~modred
Post to     : modred@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~modred
More help   : https://help.launchpad.net/ListHelp

--
Scott Lawrence

Webmaster
The Blair Robot Project
Montgomery Blair High School

--
Scott Lawrence

Webmaster
The Blair Robot Project
Montgomery Blair High School


_______________________________________________
Mailing list: https://launchpad.net/~modred
Post to     : modred@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~modred
More help   : https://help.launchpad.net/ListHelp

Follow ups

Re: Fwd: Concept stuff
From: Scott Lawrence, 2009-12-29

References

Concept stuff
From: Frederic Koehler, 2009-12-26
Re: Concept stuff
From: Scott Lawrence, 2009-12-28
Re: Concept stuff
From: Frederic Koehler, 2009-12-29
Re: Concept stuff
From: Scott Lawrence, 2009-12-29
Fwd: Concept stuff
From: Frederic Koehler, 2009-12-29
Re: Fwd: Concept stuff
From: Scott Lawrence, 2009-12-29
Re: Fwd: Concept stuff
From: Scott Lawrence, 2009-12-29
Re: Fwd: Concept stuff
From: Michael Cohen, 2009-12-29
Re: Fwd: Concept stuff
From: Scott Lawrence, 2009-12-29