modred team mailing list archive

Thread
Date
Re: Ideas

To: Michael Cohen <gnurdux@xxxxxxxxx>
From: Scott Lawrence <bytbox@xxxxxxxxx>
Date: Mon, 28 Dec 2009 23:04:57 -0500
Cc: modred@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4B397F7B.9080801@gmail.com>
Ok.  I think that for the most part, we should block system calls.

On 12/28/09, Michael Cohen <gnurdux@xxxxxxxxx> wrote:
> Very little to not at all if the code doesn't make many system calls.  I
> wouldn't expect it to make many anyway; the tasks that this is good for
> shouldn't be ones that require much communication (because the Internet
> is fairly slow; if it's always sending stuff and requiring responses
> that gives probably a .1 second latency each step at least), so its
> mostly just running on the CPU.  It would certainly add less overhead
> for CPU-intensive things than say, Java.
>
> Michael Cohen
>
> Scott Lawrence wrote:
>> And this is the only thing that needs to be done? How much will it
>> slow the code down? More importantly
>>
>>
>> On 12/28/09, Michael Cohen <gnurdux@xxxxxxxxx> wrote:
>>> We can't actually block interrupts; that require kernel mode code.
>>> Also, I think there are other mechanisms for system calls.
>>>
>>> BUT
>>>
>>> lucky for us, Linux (and other unixes, but with slightly different
>>> implementations) has a built-in way to intercept system calls.  It's
>>> called ptrace, and it is what is used for the USACO sandbox.
>>>
>>> Michael Cohen
>>>
>>> Scott Lawrence wrote:
>>>> Oh. I see.
>>>>
>>>> My first instinct is to say: "ban them!"  But it would be really nice
>>>> if most existing source code could run out-of-the-box on the cluster,
>>>> even if there wouldn't be a speedup.
>>>>
>>>> I'm not planning on support C/C++ on windows - that's way too much
>>>> trouble - so we only have to worry about unix systems.  Are interrupts
>>>> the only things we would have to block?
>>>>
>>>> On 12/28/09, Michael Cohen <gnurdux@xxxxxxxxx> wrote:
>>>>> You are simply incorrect here.  The issue isn't library calls, it's
>>>>> system calls.  Libc calls themselves use system calls, which are
>>>>> interrupts.  You can do everything without touching libc.  You just do
>>>>> the right stuff to the stuck and do an interrupt or whatever.  The
>>>>> library doesn't have some special way to access the kernel.
>>>>>
>>>>> Michael Cohen
>>>>>
>>>>> Scott Lawrence wrote:
>>>>>> You're all missing the point.  I'm claiming that, properly
>>>>>> implemented, Modred should require no sandboxing outside of what is
>>>>>> necessary to implement it's logic.
>>>>>>
>>>>>> So back to our good friends Alice, Bob, and Mallory.  Alice sends the
>>>>>> cluster (which means she directs it to the hub, but let's just
>>>>>> consider the cluster a big black box for now) some C source code.
>>>>>> This code does some strange stuff - lots of file i/o and memory
>>>>>> access.  What does the cluster do with this?
>>>>>>
>>>>>> It links the program with its own special libraries.  Even inline
>>>>>> assembly has to call functions to interface with the hard drive and
>>>>>> allocate memory and such. Malicious code that gets submitted to the
>>>>>> server will be sanitized in this fashion.  The only problem I see is
>>>>>> with illegal memory access - but I suspect this will be dealt with,
>>>>>> because the cluster has to analyze what data the client program
>>>>>> accesses anyway...
>>>>>>
>>>>>> Now Bob wants to compile and link his program on his own computer.
>>>>>> Fine.  He uses a different (smaller, incidentally) set of libraries.
>>>>>> These libraries don't intercept every call of malloc and stuff - those
>>>>>> are run on his computer.  But if he wants to access cluster data, he
>>>>>> has to use special functions.  And he can't actually run code on the
>>>>>> cluster.
>>>>>>
>>>>>> Now what does Mallory do again?
>>>>>>
>>>>>>
>>>>>> On 12/28/09, Michael Cohen <gnurdux@xxxxxxxxx> wrote:
>>>>>>> Server-side I don't see an issue.  (java's, lua's,
>>>>>>>  >  javascript's, .NET/mono, some other random thing) is basically
>>>>>>> what
>>>>>>> I already said.  There are other sandboxing systems that are designed
>>>>>>> to
>>>>>>> work on x86 native code, such as vx32 (I think I mentioned that
>>>>>>> also).
>>>>>>> Many of these schemes (with the exception of vx32) have the advantage
>>>>>>> that they also automatically make the code cross-platform.  Even vx32
>>>>>>> is
>>>>>>> supposedly portable to Windows, but nobody has done it yet and I have
>>>>>>> no
>>>>>>> idea if any of us have the expertise to.
>>>>>>>
>>>>>>> Frederic Koehler wrote:
>>>>>>>> As far as sandboxing, server-side you can presumably rely on the
>>>>>>>> operating
>>>>>>>> system's sandboxes (per-user or perhaps some more elaborate
>>>>>>>> mechanism
>>>>>>>> like
>>>>>>>> FreeBSD's jails).
>>>>>>>>
>>>>>>>> But as soon as the cluster sends code out to clients, obviously
>>>>>>>> there
>>>>>>>> is
>>>>>>>> a
>>>>>>>> big issue if we let them do whatever the hell they want. Just
>>>>>>>> preventing
>>>>>>>> assembly or anything like that simply doesn't work in C/C++, (not to
>>>>>>>> mention
>>>>>>>> it would be suprisingly hard/irritating,) since the code could still
>>>>>>>> execute
>>>>>>>> the system-calls (you could try not linking against libc,too, but
>>>>>>>> then
>>>>>>>> you
>>>>>>>> _really_ have no portability :P).
>>>>>>>>
>>>>>>>> System-call controlling is possible, but is either pretty unportable
>>>>>>>> (lots
>>>>>>>> of x86 assembly stuff) or slow-ish (virtual machines).
>>>>>>>>
>>>>>>>> That being said, if you completely seperate client-sendable code
>>>>>>>> from
>>>>>>>> server-code, I think that allays a lot of the concerns. Requiring
>>>>>>>> client-sendable code to be written for some safe VM (java's, lua's,
>>>>>>>>  javascript's, .NET/mono, some other random thing) could avoid this.
>>>>>>>> In
>>>>>>>> addition, client-sendable code would intentionally be written with
>>>>>>>> knowledge
>>>>>>>> of the sensitivity of the data it handles (i.e. not written at all
>>>>>>>> if
>>>>>>>> the
>>>>>>>> data is important).
>>>>>>>>
>>>>>>>> On Mon, Dec 28, 2009 at 7:49 PM, Michael Cohen <gnurdux@xxxxxxxxx>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I would still be happier if there were a sandbox, actually.  There
>>>>>>>>> are
>>>>>>>>> ways
>>>>>>>>> of getting around that sort of thing that are too complicated to
>>>>>>>>> prevent
>>>>>>>>> at
>>>>>>>>> the source level IMO.  For instance, you can use inline assembly.
>>>>>>>>> So
>>>>>>>>> we
>>>>>>>>> block inline assembly.  That's all well and good, but now we've
>>>>>>>>> blocked
>>>>>>>>> people using legitimate assembly optimizations. Worse, what happens
>>>>>>>>> if
>>>>>>>>> they
>>>>>>>>> execute some shellcody stuff, allowing them to escape?  I don't
>>>>>>>>> really
>>>>>>>>> know
>>>>>>>>> how to block that at all.  On the other hand, a sandbox would not
>>>>>>>>> add
>>>>>>>>> much
>>>>>>>>> overhead since these tasks will most likely use lots of CPU time
>>>>>>>>> but
>>>>>>>>> few
>>>>>>>>> system calls or whatever.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Michael Cohen
>>>>>>>>>
>>>>>>>>> Scott Lawrence wrote:
>>>>>>>>>
>>>>>>>>>> Ok, I'm going to build a prototype of my privacy model.  I'm not
>>>>>>>>>> going
>>>>>>>>>> to implement the challenge-response stuff, I'll assume there's an
>>>>>>>>>> implementation of that and that it works.
>>>>>>>>>>
>>>>>>>>>> I think I've isolated the misunderstanding about the sandboxes.
>>>>>>>>>> You
>>>>>>>>>> don't submit binary code the the Modred cluster - you either
>>>>>>>>>> submit
>>>>>>>>>> source, to be linked by the modred cluster with the relevant
>>>>>>>>>> libraries, or you link the code yourself with the libraries.  The
>>>>>>>>>> libraries that you would link with merely copy the program over to
>>>>>>>>>> the
>>>>>>>>>> cluster, where it can be executed in a manner deemed fit by the
>>>>>>>>>> code
>>>>>>>>>> there.
>>>>>>>>>>
>>>>>>>>>> I suppose you could say that that is a sandbox. ;-)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 12/28/09, Michael Cohen <gnurdux@xxxxxxxxx> wrote:
>>>>>>>>>>
>>>>>>>>>>> If you read my email more carefully, you will see that I am not
>>>>>>>>>>> necessary objecting to Scott's suggestion.  I say that it is not
>>>>>>>>>>> necessary, but that it would be the only thing necessary to allow
>>>>>>>>>>> more
>>>>>>>>>>> problem-specific privacy tasks to be used.  The need for a
>>>>>>>>>>> sandbox
>>>>>>>>>>> is
>>>>>>>>>>> pretty simple.  If we make untrusted users able to ask for tasks,
>>>>>>>>>>> if
>>>>>>>>>>> they upload code, then I don't want it running unsandboxed on my
>>>>>>>>>>> computer.  Otherwise, their code could steal my files, wipe my
>>>>>>>>>>> harddisk,
>>>>>>>>>>> install Windows or do other undesirable things.  If it is
>>>>>>>>>>> sandboxed,
>>>>>>>>>>> then arbitary code can be executed safely, as long as we trust
>>>>>>>>>>> the
>>>>>>>>>>> sandbox.  Sandboxed environments are often also cross-platform,
>>>>>>>>>>> another
>>>>>>>>>>> plus, since they typically replace or intercept any kind of
>>>>>>>>>>> system
>>>>>>>>>>> call.
>>>>>>>>>>>
>>>>>>>>>>> Michael Cohen
>>>>>>>>>>>
>>>>>>>>>>> Scott Lawrence wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Well, I'm glad someone expresses opinions I don't agree with...
>>>>>>>>>>>>
>>>>>>>>>>>> I think Mikey's objection to privacy concerns is that it's so
>>>>>>>>>>>> problem-specific, we can't reasonably expect to have a general
>>>>>>>>>>>> implementation.  But if the user specifies which parts of the
>>>>>>>>>>>> data
>>>>>>>>>>>> are
>>>>>>>>>>>> private, the Modred hub just has to be sure to divvy up tasks in
>>>>>>>>>>>> a
>>>>>>>>>>>> way
>>>>>>>>>>>> that gives those bits of information only to the trusted,
>>>>>>>>>>>> dedicated
>>>>>>>>>>>> servers.
>>>>>>>>>>>>
>>>>>>>>>>>> For the purposes of clarity, I will be referring to dedicated
>>>>>>>>>>>> servers
>>>>>>>>>>>> as simply "servers", and the central server as the "hub".
>>>>>>>>>>>>
>>>>>>>>>>>> I don't see the need for a sandbox.  Could you present some
>>>>>>>>>>>> specific
>>>>>>>>>>>> attacks that a sandbox would fix?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 12/28/09, Michael Cohen <gnurdux@xxxxxxxxx> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> It seems to me that dealing with privacy concerns is an
>>>>>>>>>>>>> extremely
>>>>>>>>>>>>> problem-specific issue.  In any given case you need to work out
>>>>>>>>>>>>> how
>>>>>>>>>>>>> much
>>>>>>>>>>>>> you can give to people without letting private information
>>>>>>>>>>>>> leak,
>>>>>>>>>>>>> but
>>>>>>>>>>>>> the
>>>>>>>>>>>>> details vary greatly from problem to problem.  That isn't our
>>>>>>>>>>>>> business,
>>>>>>>>>>>>> and I don't think we should concern ourselves with it too much.
>>>>>>>>>>>>> The
>>>>>>>>>>>>> way
>>>>>>>>>>>>> I see it there are two options:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1. make this designed for stuff without privacy concerns
>>>>>>>>>>>>>        I think this is both the easiest and the best option.  I
>>>>>>>>>>>>> don't
>>>>>>>>>>>>> really
>>>>>>>>>>>>> like the idea of a public, free service doing computations for
>>>>>>>>>>>>> an
>>>>>>>>>>>>> evil
>>>>>>>>>>>>> corporation anyway; if it's being done BY the public it should
>>>>>>>>>>>>> be
>>>>>>>>>>>>> done
>>>>>>>>>>>>> FOR the public.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2. add in a small amount of functionality designed to
>>>>>>>>>>>>> facilitate
>>>>>>>>>>>>> dealing
>>>>>>>>>>>>> with privacy concerns
>>>>>>>>>>>>>        At the level of this project, that would probably just
>>>>>>>>>>>>> be
>>>>>>>>>>>>> the
>>>>>>>>>>>>> controls
>>>>>>>>>>>>> on what data gets sent to what people.  There might be reasons
>>>>>>>>>>>>> for
>>>>>>>>>>>>> adding such controls anyway; some tasks could be designated for
>>>>>>>>>>>>> only
>>>>>>>>>>>>> "trusted" users.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Either way I doubt that this will be a big issue.  I think
>>>>>>>>>>>>> maybe
>>>>>>>>>>>>> a
>>>>>>>>>>>>> bigger issue is how to run arbitrary code efficiently and
>>>>>>>>>>>>> securely.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I see only a few solutions
>>>>>>>>>>>>>
>>>>>>>>>>>>>        Don't allow arbitrary code, but only a defined set of
>>>>>>>>>>>>> tasks.
>>>>>>>>>>>>>  Or,
>>>>>>>>>>>>> similarly, allow some "trusted" set of tasks, each separately
>>>>>>>>>>>>> ported
>>>>>>>>>>>>> to
>>>>>>>>>>>>> each platform (like boinc).
>>>>>>>>>>>>>
>>>>>>>>>>>>>        Use Java.  This lets us easily sandbox it and is
>>>>>>>>>>>>> cross-platform,
>>>>>>>>>>>>> but
>>>>>>>>>>>>> sacrifices a bit on efficiency.  Also, Java can be annoying
>>>>>>>>>>>>> (although
>>>>>>>>>>>>> other JVM languages would also work in this situation).
>>>>>>>>>>>>>
>>>>>>>>>>>>>        There are ways of running cross-platform, C/C++ code in
>>>>>>>>>>>>> a
>>>>>>>>>>>>> sandbox as
>>>>>>>>>>>>> well.  One possibility is to use LLVM, although the LLVM
>>>>>>>>>>>>> developers
>>>>>>>>>>>>> specifically say that LLVM is NOT designed to be used this way.
>>>>>>>>>>>>>  Another
>>>>>>>>>>>>> possibility is to use a sandboxed code system that works on
>>>>>>>>>>>>> multiple
>>>>>>>>>>>>> operating systems but only on x86.  This includes things like
>>>>>>>>>>>>> VX32,
>>>>>>>>>>>>> which is apparently portable to Windows, but hasn't been
>>>>>>>>>>>>> ported.
>>>>>>>>>>>>> I
>>>>>>>>>>>>> don't know whether or not that sort of thing is within our
>>>>>>>>>>>>> abilities.
>>>>>>>>>>>>> Another option might be Google Native Client; that is designed
>>>>>>>>>>>>> to
>>>>>>>>>>>>> be
>>>>>>>>>>>>> used in a web browser but I don't know how hard it would be to
>>>>>>>>>>>>> "rip
>>>>>>>>>>>>> out"
>>>>>>>>>>>>> the sandboxing/cross-OS x86 code stuff.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Michael Cohen
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> Mailing list: https://launchpad.net/~modred
>>>>>>>>>>>>> Post to     : modred@xxxxxxxxxxxxxxxxxxx
>>>>>>>>>>>>> Unsubscribe : https://launchpad.net/~modred
>>>>>>>>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Mailing list: https://launchpad.net/~modred
>>>>>>>>>>> Post to     : modred@xxxxxxxxxxxxxxxxxxx
>>>>>>>>>>> Unsubscribe : https://launchpad.net/~modred
>>>>>>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Mailing list: https://launchpad.net/~modred
>>>>>>>>> Post to     : modred@xxxxxxxxxxxxxxxxxxx
>>>>>>>>> Unsubscribe : https://launchpad.net/~modred
>>>>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Mailing list: https://launchpad.net/~modred
>>>>>>> Post to     : modred@xxxxxxxxxxxxxxxxxxx
>>>>>>> Unsubscribe : https://launchpad.net/~modred
>>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>>
>>>>> _______________________________________________
>>>>> Mailing list: https://launchpad.net/~modred
>>>>> Post to     : modred@xxxxxxxxxxxxxxxxxxx
>>>>> Unsubscribe : https://launchpad.net/~modred
>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>
>>>>
>>>
>>> _______________________________________________
>>> Mailing list: https://launchpad.net/~modred
>>> Post to     : modred@xxxxxxxxxxxxxxxxxxx
>>> Unsubscribe : https://launchpad.net/~modred
>>> More help   : https://help.launchpad.net/ListHelp
>>>
>>
>>
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~modred
> Post to     : modred@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~modred
> More help   : https://help.launchpad.net/ListHelp
>


-- 
Scott Lawrence

Webmaster
The Blair Robot Project
Montgomery Blair High School
Follow ups

Re: Ideas
From: Scott Lawrence, 2009-12-29
References

Ideas
From: Michael Cohen, 2009-12-28
Re: Ideas
From: Michael Cohen, 2009-12-29
Re: Ideas
From: Frederic Koehler, 2009-12-29
Re: Ideas
From: Michael Cohen, 2009-12-29
Re: Ideas
From: Scott Lawrence, 2009-12-29
Re: Ideas
From: Michael Cohen, 2009-12-29
Re: Ideas
From: Scott Lawrence, 2009-12-29
Re: Ideas
From: Michael Cohen, 2009-12-29
Re: Ideas
From: Scott Lawrence, 2009-12-29
Re: Ideas
From: Michael Cohen, 2009-12-29