modred team mailing list archive

Thread
Date

Re: Ideas

To: modred@xxxxxxxxxxxxxxxxxxx
From: Michael Cohen <gnurdux@xxxxxxxxx>
Date: Tue, 29 Dec 2009 17:14:41 -0500
In-reply-to: <53a52e1f0912291404m50cdb4d5tba20a913b5b73aa@mail.gmail.com>
User-agent: Mozilla-Thunderbird 2.0.0.19 (X11/20090103)

SDL is the cross-platform graphics/sound library that NaCl (and manyother cross-platform graphical programs) uses. Terminal based programsdon't use it. I will look for benchmarks sometime today; I think thereare some that come with NaCl.


Michael Cohen

Scott Lawrence wrote:

What exactly is SDL, and why would any terminal-based programs use it?

Can you get us trustworthy benchmarks?


On 12/29/09, Michael Cohen <gnurdux@xxxxxxxxx> wrote:

NaCl seems to be working as a sandbox.  The only thing that you need to
do that is strange with it is compile with sdl=none (this disables SDL)
so that the programs can't make graphical windows pop up.

Michael Cohen

Scott Lawrence wrote:

Great.  Just make sure it doesn't come with a performance hit.

On 12/29/09, Michael Cohen <gnurdux@xxxxxxxxx> wrote:

So I *think* we might be able to make a cross-platform sandbox.  Well,
make is not quite the write word.  "Use".  Google's Native Client is
designed to make safe, cross-platform native code for use as a browser
plugin, but apparently the sandboxy part is available separately.  If we
can get this working, this would let the client send the *same native
code* to the Windows and Linux clients, and have them *both* execute it
safely (since it uses system-call-free code).  I am checking out the
source now.

Michael Cohen

Scott Lawrence wrote:

Short answer: no.

Long answer: We're not writing a C sandbox that works on
windows/non-unix.  That's the only restriction.  Windows people can
run the java, perl, python, and whatever other clients we write.
Furthermore, they can submit code with a C client, but there better be
a java client to run code to give them CPU time credits (provided that
the cluster they're using has that feature enabled).

On 12/28/09, Frederic Koehler <fkfire@xxxxxxxxx> wrote:

Um...Are we going to require that all clients run linux?

On Mon, Dec 28, 2009 at 11:08 PM, Michael Cohen <gnurdux@xxxxxxxxx>
wrote:

ptrace tells you whenever the process tried to make a system call.
You
can
then do whatever you want with that information, including recording
it
and
passing it on to the kernel or doing your own action.


Michael Cohen

Scott Lawrence wrote:

Actually, not.  Ignore that last message.

Can you build a prototype, that calls a specified function in place
of
the kernel??

On 12/28/09, Scott Lawrence <bytbox@xxxxxxxxx> wrote:

Ok.  I think that for the most part, we should block system calls.

On 12/28/09, Michael Cohen <gnurdux@xxxxxxxxx> wrote:

Very little to not at all if the code doesn't make many system
calls.
I
wouldn't expect it to make many anyway; the tasks that this is good
for
shouldn't be ones that require much communication (because the
Internet
is fairly slow; if it's always sending stuff and requiring
responses
that gives probably a .1 second latency each step at least), so its
mostly just running on the CPU.  It would certainly add less
overhead
for CPU-intensive things than say, Java.

Michael Cohen

Scott Lawrence wrote:

And this is the only thing that needs to be done? How much will it
slow the code down? More importantly


On 12/28/09, Michael Cohen <gnurdux@xxxxxxxxx> wrote:

We can't actually block interrupts; that require kernel mode
code.
Also, I think there are other mechanisms for system calls.

BUT

lucky for us, Linux (and other unixes, but with slightly
different
implementations) has a built-in way to intercept system calls.
It's
called ptrace, and it is what is used for the USACO sandbox.

Michael Cohen

Scott Lawrence wrote:

Oh. I see.

My first instinct is to say: "ban them!"  But it would be really
nice
if most existing source code could run out-of-the-box on the
cluster,
even if there wouldn't be a speedup.

I'm not planning on support C/C++ on windows - that's way too
much
trouble - so we only have to worry about unix systems.  Are
interrupts
the only things we would have to block?

On 12/28/09, Michael Cohen <gnurdux@xxxxxxxxx> wrote:

You are simply incorrect here.  The issue isn't library calls,
it's
system calls.  Libc calls themselves use system calls, which
are
interrupts.  You can do everything without touching libc.  You
just
do
the right stuff to the stuck and do an interrupt or whatever.
The
library doesn't have some special way to access the kernel.

Michael Cohen

Scott Lawrence wrote:

You're all missing the point.  I'm claiming that, properly
implemented, Modred should require no sandboxing outside of
what
is
necessary to implement it's logic.

So back to our good friends Alice, Bob, and Mallory.  Alice
sends
the
cluster (which means she directs it to the hub, but let's just
consider the cluster a big black box for now) some C source
code.
This code does some strange stuff - lots of file i/o and
memory
access.  What does the cluster do with this?

It links the program with its own special libraries.  Even
inline
assembly has to call functions to interface with the hard
drive
and
allocate memory and such. Malicious code that gets submitted
to
the
server will be sanitized in this fashion.  The only problem I
see
is
with illegal memory access - but I suspect this will be dealt
with,
because the cluster has to analyze what data the client
program
accesses anyway...

Now Bob wants to compile and link his program on his own
computer.
Fine.  He uses a different (smaller, incidentally) set of
libraries.
These libraries don't intercept every call of malloc and stuff
-
those
are run on his computer.  But if he wants to access cluster
data,
he
has to use special functions.  And he can't actually run code
on
the
cluster.

Now what does Mallory do again?


On 12/28/09, Michael Cohen <gnurdux@xxxxxxxxx> wrote:

Server-side I don't see an issue.  (java's, lua's,
 >  javascript's, .NET/mono, some other random thing) is
basically
what
I already said.  There are other sandboxing systems that are
designed
to
work on x86 native code, such as vx32 (I think I mentioned
that
also).
Many of these schemes (with the exception of vx32) have the
advantage
that they also automatically make the code cross-platform.
Even
vx32
is
supposedly portable to Windows, but nobody has done it yet
and
I
have
no
idea if any of us have the expertise to.

Frederic Koehler wrote:

As far as sandboxing, server-side you can presumably rely on
the
operating
system's sandboxes (per-user or perhaps some more elaborate
mechanism
like
FreeBSD's jails).

But as soon as the cluster sends code out to clients,
obviously
there
is
a
big issue if we let them do whatever the hell they want.
Just
preventing
assembly or anything like that simply doesn't work in C/C++,
(not
to
mention
it would be suprisingly hard/irritating,) since the code
could
still
execute
the system-calls (you could try not linking against
libc,too,
but
then
you
_really_ have no portability :P).

System-call controlling is possible, but is either pretty
unportable
(lots
of x86 assembly stuff) or slow-ish (virtual machines).

That being said, if you completely seperate client-sendable
code
from
server-code, I think that allays a lot of the concerns.
Requiring
client-sendable code to be written for some safe VM (java's,
lua's,
 javascript's, .NET/mono, some other random thing) could
avoid
this.
In
addition, client-sendable code would intentionally be
written
with
knowledge
of the sensitivity of the data it handles (i.e. not written
at
all
if
the
data is important).

On Mon, Dec 28, 2009 at 7:49 PM, Michael Cohen <
gnurdux@xxxxxxxxx>
wrote:

 I would still be happier if there were a sandbox, actually.

There
are
ways
of getting around that sort of thing that are too
complicated
to
prevent
at
the source level IMO.  For instance, you can use inline
assembly.
So
we
block inline assembly.  That's all well and good, but now
we've
blocked
people using legitimate assembly optimizations. Worse, what
happens
if
they
execute some shellcody stuff, allowing them to escape?  I
don't
really
know
how to block that at all.  On the other hand, a sandbox
would
not
add
much
overhead since these tasks will most likely use lots of CPU
time
but
few
system calls or whatever.


Michael Cohen

Scott Lawrence wrote:

 Ok, I'm going to build a prototype of my privacy model.
I'm

not
going
to implement the challenge-response stuff, I'll assume
there's
an
implementation of that and that it works.

I think I've isolated the misunderstanding about the
sandboxes.
You
don't submit binary code the the Modred cluster - you
either
submit
source, to be linked by the modred cluster with the
relevant
libraries, or you link the code yourself with the
libraries.
The
libraries that you would link with merely copy the program
over
to
the
cluster, where it can be executed in a manner deemed fit
by
the
code
there.

I suppose you could say that that is a sandbox. ;-)


On 12/28/09, Michael Cohen <gnurdux@xxxxxxxxx> wrote:

 If you read my email more carefully, you will see that I
am

not
necessary objecting to Scott's suggestion.  I say that it
is
not
necessary, but that it would be the only thing necessary
to
allow
more
problem-specific privacy tasks to be used.  The need for
a
sandbox
is
pretty simple.  If we make untrusted users able to ask
for
tasks,
if
they upload code, then I don't want it running
unsandboxed
on
my
computer.  Otherwise, their code could steal my files,
wipe
my
harddisk,
install Windows or do other undesirable things.  If it is
sandboxed,
then arbitary code can be executed safely, as long as we
trust
the
sandbox.  Sandboxed environments are often also
cross-platform,
another
plus, since they typically replace or intercept any kind
of
system
call.

Michael Cohen

Scott Lawrence wrote:

 Well, I'm glad someone expresses opinions I don't agree

with...

I think Mikey's objection to privacy concerns is that
it's
so
problem-specific, we can't reasonably expect to have a
general
implementation.  But if the user specifies which parts
of
the
data
are
private, the Modred hub just has to be sure to divvy up
tasks
in
a
way
that gives those bits of information only to the
trusted,
dedicated
servers.

For the purposes of clarity, I will be referring to
dedicated
servers
as simply "servers", and the central server as the
"hub".

I don't see the need for a sandbox.  Could you present
some
specific
attacks that a sandbox would fix?


On 12/28/09, Michael Cohen <gnurdux@xxxxxxxxx> wrote:

 It seems to me that dealing with privacy concerns is an

extremely
problem-specific issue.  In any given case you need to
work
out
how
much
you can give to people without letting private
information
leak,
but
the
details vary greatly from problem to problem.  That
isn't
our
business,
and I don't think we should concern ourselves with it
too
much.
The
way
I see it there are two options:

1. make this designed for stuff without privacy
concerns
      I think this is both the easiest and the best
option.
I
don't
really
like the idea of a public, free service doing
computations
for
an
evil
corporation anyway; if it's being done BY the public it
should
be
done
FOR the public.

2. add in a small amount of functionality designed to
facilitate
dealing
with privacy concerns
      At the level of this project, that would probably
just
be
the
controls
on what data gets sent to what people.  There might be
reasons
for
adding such controls anyway; some tasks could be
designated
for
only
"trusted" users.

Either way I doubt that this will be a big issue.  I
think
maybe
a
bigger issue is how to run arbitrary code efficiently
and
securely.

I see only a few solutions

      Don't allow arbitrary code, but only a defined
set
of
tasks.
 Or,
similarly, allow some "trusted" set of tasks, each
separately
ported
to
each platform (like boinc).

      Use Java.  This lets us easily sandbox it and is
cross-platform,
but
sacrifices a bit on efficiency.  Also, Java can be
annoying
(although
other JVM languages would also work in this situation).

      There are ways of running cross-platform, C/C++
code
in
a
sandbox as
well.  One possibility is to use LLVM, although the
LLVM
developers
specifically say that LLVM is NOT designed to be used
this
way.
 Another
possibility is to use a sandboxed code system that
works
on
multiple
operating systems but only on x86.  This includes
things
like
VX32,
which is apparently portable to Windows, but hasn't
been
ported.
I
don't know whether or not that sort of thing is within
our
abilities.
Another option might be Google Native Client; that is
designed
to
be
used in a web browser but I don't know how hard it
would
be
to
"rip
out"
the sandboxing/cross-OS x86 code stuff.

Michael Cohen

_______________________________________________
Mailing list: https://launchpad.net/~modred
Post to     : modred@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~modred
More help   : https://help.launchpad.net/ListHelp


 _______________________________________________

Mailing list: https://launchpad.net/~modred
Post to     : modred@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~modred
More help   : https://help.launchpad.net/ListHelp


 _______________________________________________

Mailing list: https://launchpad.net/~modred
Post to     : modred@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~modred
More help   : https://help.launchpad.net/ListHelp

 _______________________________________________

Mailing list: https://launchpad.net/~modred
Post to     : modred@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~modred
More help   : https://help.launchpad.net/ListHelp

 _______________________________________________

Mailing list: https://launchpad.net/~modred
Post to     : modred@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~modred
More help   : https://help.launchpad.net/ListHelp

 _______________________________________________

Mailing list: https://launchpad.net/~modred
Post to     : modred@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~modred
More help   : https://help.launchpad.net/ListHelp

_______________________________________________
Mailing list: https://launchpad.net/~modred
Post to     : modred@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~modred
More help   : https://help.launchpad.net/ListHelp

--
Scott Lawrence

Webmaster
The Blair Robot Project
Montgomery Blair High School

_______________________________________________
Mailing list: https://launchpad.net/~modred
Post to     : modred@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~modred
More help   : https://help.launchpad.net/ListHelp

_______________________________________________
Mailing list: https://launchpad.net/~modred
Post to     : modred@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~modred
More help   : https://help.launchpad.net/ListHelp


_______________________________________________
Mailing list: https://launchpad.net/~modred
Post to     : modred@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~modred
More help   : https://help.launchpad.net/ListHelp

Follow ups

Re: Ideas
From: Michael Cohen, 2009-12-29

References

Ideas
From: Michael Cohen, 2009-12-28
Re: Ideas
From: Michael Cohen, 2009-12-29
Re: Ideas
From: Scott Lawrence, 2009-12-29
Re: Ideas
From: Scott Lawrence, 2009-12-29
Re: Ideas
From: Michael Cohen, 2009-12-29
Re: Ideas
From: Frederic Koehler, 2009-12-29
Re: Ideas
From: Scott Lawrence, 2009-12-29
Re: Ideas
From: Michael Cohen, 2009-12-29
Re: Ideas
From: Scott Lawrence, 2009-12-29
Re: Ideas
From: Michael Cohen, 2009-12-29
Re: Ideas
From: Scott Lawrence, 2009-12-29