modred team mailing list archive

Thread
Date
Re: Ideas

To: Michael Cohen <gnurdux@xxxxxxxxx>
From: Scott Lawrence <bytbox@xxxxxxxxx>
Date: Mon, 28 Dec 2009 23:01:08 -0500
Cc: modred@xxxxxxxxxxxxxxxxxxx
In-reply-to: <53a52e1f0912282000t3059239fk14485a19350b291d@mail.gmail.com>
And this is the only thing that needs to be done?  How much will it
slow the code down?  More importantly, how will it affect code that
doesn't use system calls?

(sorry for that bad email above)

On 12/28/09, Scott Lawrence <bytbox@xxxxxxxxx> wrote:
> And this is the only thing that needs to be done? How much will it
> slow the code down? More importantly
>
>
> On 12/28/09, Michael Cohen <gnurdux@xxxxxxxxx> wrote:
>> We can't actually block interrupts; that require kernel mode code.
>> Also, I think there are other mechanisms for system calls.
>>
>> BUT
>>
>> lucky for us, Linux (and other unixes, but with slightly different
>> implementations) has a built-in way to intercept system calls.  It's
>> called ptrace, and it is what is used for the USACO sandbox.
>>
>> Michael Cohen
>>
>> Scott Lawrence wrote:
>>> Oh. I see.
>>>
>>> My first instinct is to say: "ban them!"  But it would be really nice
>>> if most existing source code could run out-of-the-box on the cluster,
>>> even if there wouldn't be a speedup.
>>>
>>> I'm not planning on support C/C++ on windows - that's way too much
>>> trouble - so we only have to worry about unix systems.  Are interrupts
>>> the only things we would have to block?
>>>
>>> On 12/28/09, Michael Cohen <gnurdux@xxxxxxxxx> wrote:
>>>> You are simply incorrect here.  The issue isn't library calls, it's
>>>> system calls.  Libc calls themselves use system calls, which are
>>>> interrupts.  You can do everything without touching libc.  You just do
>>>> the right stuff to the stuck and do an interrupt or whatever.  The
>>>> library doesn't have some special way to access the kernel.
>>>>
>>>> Michael Cohen
>>>>
>>>> Scott Lawrence wrote:
>>>>> You're all missing the point.  I'm claiming that, properly
>>>>> implemented, Modred should require no sandboxing outside of what is
>>>>> necessary to implement it's logic.
>>>>>
>>>>> So back to our good friends Alice, Bob, and Mallory.  Alice sends the
>>>>> cluster (which means she directs it to the hub, but let's just
>>>>> consider the cluster a big black box for now) some C source code.
>>>>> This code does some strange stuff - lots of file i/o and memory
>>>>> access.  What does the cluster do with this?
>>>>>
>>>>> It links the program with its own special libraries.  Even inline
>>>>> assembly has to call functions to interface with the hard drive and
>>>>> allocate memory and such. Malicious code that gets submitted to the
>>>>> server will be sanitized in this fashion.  The only problem I see is
>>>>> with illegal memory access - but I suspect this will be dealt with,
>>>>> because the cluster has to analyze what data the client program
>>>>> accesses anyway...
>>>>>
>>>>> Now Bob wants to compile and link his program on his own computer.
>>>>> Fine.  He uses a different (smaller, incidentally) set of libraries.
>>>>> These libraries don't intercept every call of malloc and stuff - those
>>>>> are run on his computer.  But if he wants to access cluster data, he
>>>>> has to use special functions.  And he can't actually run code on the
>>>>> cluster.
>>>>>
>>>>> Now what does Mallory do again?
>>>>>
>>>>>
>>>>> On 12/28/09, Michael Cohen <gnurdux@xxxxxxxxx> wrote:
>>>>>> Server-side I don't see an issue.  (java's, lua's,
>>>>>>  >  javascript's, .NET/mono, some other random thing) is basically
>>>>>> what
>>>>>> I already said.  There are other sandboxing systems that are designed
>>>>>> to
>>>>>> work on x86 native code, such as vx32 (I think I mentioned that
>>>>>> also).
>>>>>> Many of these schemes (with the exception of vx32) have the advantage
>>>>>> that they also automatically make the code cross-platform.  Even vx32
>>>>>> is
>>>>>> supposedly portable to Windows, but nobody has done it yet and I have
>>>>>> no
>>>>>> idea if any of us have the expertise to.
>>>>>>
>>>>>> Frederic Koehler wrote:
>>>>>>> As far as sandboxing, server-side you can presumably rely on the
>>>>>>> operating
>>>>>>> system's sandboxes (per-user or perhaps some more elaborate
>>>>>>> mechanism
>>>>>>> like
>>>>>>> FreeBSD's jails).
>>>>>>>
>>>>>>> But as soon as the cluster sends code out to clients, obviously
>>>>>>> there
>>>>>>> is
>>>>>>> a
>>>>>>> big issue if we let them do whatever the hell they want. Just
>>>>>>> preventing
>>>>>>> assembly or anything like that simply doesn't work in C/C++, (not to
>>>>>>> mention
>>>>>>> it would be suprisingly hard/irritating,) since the code could still
>>>>>>> execute
>>>>>>> the system-calls (you could try not linking against libc,too, but
>>>>>>> then
>>>>>>> you
>>>>>>> _really_ have no portability :P).
>>>>>>>
>>>>>>> System-call controlling is possible, but is either pretty unportable
>>>>>>> (lots
>>>>>>> of x86 assembly stuff) or slow-ish (virtual machines).
>>>>>>>
>>>>>>> That being said, if you completely seperate client-sendable code
>>>>>>> from
>>>>>>> server-code, I think that allays a lot of the concerns. Requiring
>>>>>>> client-sendable code to be written for some safe VM (java's, lua's,
>>>>>>>  javascript's, .NET/mono, some other random thing) could avoid this.
>>>>>>> In
>>>>>>> addition, client-sendable code would intentionally be written with
>>>>>>> knowledge
>>>>>>> of the sensitivity of the data it handles (i.e. not written at all
>>>>>>> if
>>>>>>> the
>>>>>>> data is important).
>>>>>>>
>>>>>>> On Mon, Dec 28, 2009 at 7:49 PM, Michael Cohen <gnurdux@xxxxxxxxx>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I would still be happier if there were a sandbox, actually.  There
>>>>>>>> are
>>>>>>>> ways
>>>>>>>> of getting around that sort of thing that are too complicated to
>>>>>>>> prevent
>>>>>>>> at
>>>>>>>> the source level IMO.  For instance, you can use inline assembly.
>>>>>>>> So
>>>>>>>> we
>>>>>>>> block inline assembly.  That's all well and good, but now we've
>>>>>>>> blocked
>>>>>>>> people using legitimate assembly optimizations. Worse, what happens
>>>>>>>> if
>>>>>>>> they
>>>>>>>> execute some shellcody stuff, allowing them to escape?  I don't
>>>>>>>> really
>>>>>>>> know
>>>>>>>> how to block that at all.  On the other hand, a sandbox would not
>>>>>>>> add
>>>>>>>> much
>>>>>>>> overhead since these tasks will most likely use lots of CPU time
>>>>>>>> but
>>>>>>>> few
>>>>>>>> system calls or whatever.
>>>>>>>>
>>>>>>>>
>>>>>>>> Michael Cohen
>>>>>>>>
>>>>>>>> Scott Lawrence wrote:
>>>>>>>>
>>>>>>>>> Ok, I'm going to build a prototype of my privacy model.  I'm not
>>>>>>>>> going
>>>>>>>>> to implement the challenge-response stuff, I'll assume there's an
>>>>>>>>> implementation of that and that it works.
>>>>>>>>>
>>>>>>>>> I think I've isolated the misunderstanding about the sandboxes.
>>>>>>>>> You
>>>>>>>>> don't submit binary code the the Modred cluster - you either
>>>>>>>>> submit
>>>>>>>>> source, to be linked by the modred cluster with the relevant
>>>>>>>>> libraries, or you link the code yourself with the libraries.  The
>>>>>>>>> libraries that you would link with merely copy the program over to
>>>>>>>>> the
>>>>>>>>> cluster, where it can be executed in a manner deemed fit by the
>>>>>>>>> code
>>>>>>>>> there.
>>>>>>>>>
>>>>>>>>> I suppose you could say that that is a sandbox. ;-)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 12/28/09, Michael Cohen <gnurdux@xxxxxxxxx> wrote:
>>>>>>>>>
>>>>>>>>>> If you read my email more carefully, you will see that I am not
>>>>>>>>>> necessary objecting to Scott's suggestion.  I say that it is not
>>>>>>>>>> necessary, but that it would be the only thing necessary to allow
>>>>>>>>>> more
>>>>>>>>>> problem-specific privacy tasks to be used.  The need for a
>>>>>>>>>> sandbox
>>>>>>>>>> is
>>>>>>>>>> pretty simple.  If we make untrusted users able to ask for tasks,
>>>>>>>>>> if
>>>>>>>>>> they upload code, then I don't want it running unsandboxed on my
>>>>>>>>>> computer.  Otherwise, their code could steal my files, wipe my
>>>>>>>>>> harddisk,
>>>>>>>>>> install Windows or do other undesirable things.  If it is
>>>>>>>>>> sandboxed,
>>>>>>>>>> then arbitary code can be executed safely, as long as we trust
>>>>>>>>>> the
>>>>>>>>>> sandbox.  Sandboxed environments are often also cross-platform,
>>>>>>>>>> another
>>>>>>>>>> plus, since they typically replace or intercept any kind of
>>>>>>>>>> system
>>>>>>>>>> call.
>>>>>>>>>>
>>>>>>>>>> Michael Cohen
>>>>>>>>>>
>>>>>>>>>> Scott Lawrence wrote:
>>>>>>>>>>
>>>>>>>>>>> Well, I'm glad someone expresses opinions I don't agree with...
>>>>>>>>>>>
>>>>>>>>>>> I think Mikey's objection to privacy concerns is that it's so
>>>>>>>>>>> problem-specific, we can't reasonably expect to have a general
>>>>>>>>>>> implementation.  But if the user specifies which parts of the
>>>>>>>>>>> data
>>>>>>>>>>> are
>>>>>>>>>>> private, the Modred hub just has to be sure to divvy up tasks in
>>>>>>>>>>> a
>>>>>>>>>>> way
>>>>>>>>>>> that gives those bits of information only to the trusted,
>>>>>>>>>>> dedicated
>>>>>>>>>>> servers.
>>>>>>>>>>>
>>>>>>>>>>> For the purposes of clarity, I will be referring to dedicated
>>>>>>>>>>> servers
>>>>>>>>>>> as simply "servers", and the central server as the "hub".
>>>>>>>>>>>
>>>>>>>>>>> I don't see the need for a sandbox.  Could you present some
>>>>>>>>>>> specific
>>>>>>>>>>> attacks that a sandbox would fix?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 12/28/09, Michael Cohen <gnurdux@xxxxxxxxx> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> It seems to me that dealing with privacy concerns is an
>>>>>>>>>>>> extremely
>>>>>>>>>>>> problem-specific issue.  In any given case you need to work out
>>>>>>>>>>>> how
>>>>>>>>>>>> much
>>>>>>>>>>>> you can give to people without letting private information
>>>>>>>>>>>> leak,
>>>>>>>>>>>> but
>>>>>>>>>>>> the
>>>>>>>>>>>> details vary greatly from problem to problem.  That isn't our
>>>>>>>>>>>> business,
>>>>>>>>>>>> and I don't think we should concern ourselves with it too much.
>>>>>>>>>>>> The
>>>>>>>>>>>> way
>>>>>>>>>>>> I see it there are two options:
>>>>>>>>>>>>
>>>>>>>>>>>> 1. make this designed for stuff without privacy concerns
>>>>>>>>>>>>        I think this is both the easiest and the best option.  I
>>>>>>>>>>>> don't
>>>>>>>>>>>> really
>>>>>>>>>>>> like the idea of a public, free service doing computations for
>>>>>>>>>>>> an
>>>>>>>>>>>> evil
>>>>>>>>>>>> corporation anyway; if it's being done BY the public it should
>>>>>>>>>>>> be
>>>>>>>>>>>> done
>>>>>>>>>>>> FOR the public.
>>>>>>>>>>>>
>>>>>>>>>>>> 2. add in a small amount of functionality designed to
>>>>>>>>>>>> facilitate
>>>>>>>>>>>> dealing
>>>>>>>>>>>> with privacy concerns
>>>>>>>>>>>>        At the level of this project, that would probably just
>>>>>>>>>>>> be
>>>>>>>>>>>> the
>>>>>>>>>>>> controls
>>>>>>>>>>>> on what data gets sent to what people.  There might be reasons
>>>>>>>>>>>> for
>>>>>>>>>>>> adding such controls anyway; some tasks could be designated for
>>>>>>>>>>>> only
>>>>>>>>>>>> "trusted" users.
>>>>>>>>>>>>
>>>>>>>>>>>> Either way I doubt that this will be a big issue.  I think
>>>>>>>>>>>> maybe
>>>>>>>>>>>> a
>>>>>>>>>>>> bigger issue is how to run arbitrary code efficiently and
>>>>>>>>>>>> securely.
>>>>>>>>>>>>
>>>>>>>>>>>> I see only a few solutions
>>>>>>>>>>>>
>>>>>>>>>>>>        Don't allow arbitrary code, but only a defined set of
>>>>>>>>>>>> tasks.
>>>>>>>>>>>>  Or,
>>>>>>>>>>>> similarly, allow some "trusted" set of tasks, each separately
>>>>>>>>>>>> ported
>>>>>>>>>>>> to
>>>>>>>>>>>> each platform (like boinc).
>>>>>>>>>>>>
>>>>>>>>>>>>        Use Java.  This lets us easily sandbox it and is
>>>>>>>>>>>> cross-platform,
>>>>>>>>>>>> but
>>>>>>>>>>>> sacrifices a bit on efficiency.  Also, Java can be annoying
>>>>>>>>>>>> (although
>>>>>>>>>>>> other JVM languages would also work in this situation).
>>>>>>>>>>>>
>>>>>>>>>>>>        There are ways of running cross-platform, C/C++ code in
>>>>>>>>>>>> a
>>>>>>>>>>>> sandbox as
>>>>>>>>>>>> well.  One possibility is to use LLVM, although the LLVM
>>>>>>>>>>>> developers
>>>>>>>>>>>> specifically say that LLVM is NOT designed to be used this way.
>>>>>>>>>>>>  Another
>>>>>>>>>>>> possibility is to use a sandboxed code system that works on
>>>>>>>>>>>> multiple
>>>>>>>>>>>> operating systems but only on x86.  This includes things like
>>>>>>>>>>>> VX32,
>>>>>>>>>>>> which is apparently portable to Windows, but hasn't been
>>>>>>>>>>>> ported.
>>>>>>>>>>>> I
>>>>>>>>>>>> don't know whether or not that sort of thing is within our
>>>>>>>>>>>> abilities.
>>>>>>>>>>>> Another option might be Google Native Client; that is designed
>>>>>>>>>>>> to
>>>>>>>>>>>> be
>>>>>>>>>>>> used in a web browser but I don't know how hard it would be to
>>>>>>>>>>>> "rip
>>>>>>>>>>>> out"
>>>>>>>>>>>> the sandboxing/cross-OS x86 code stuff.
>>>>>>>>>>>>
>>>>>>>>>>>> Michael Cohen
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Mailing list: https://launchpad.net/~modred
>>>>>>>>>>>> Post to     : modred@xxxxxxxxxxxxxxxxxxx
>>>>>>>>>>>> Unsubscribe : https://launchpad.net/~modred
>>>>>>>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Mailing list: https://launchpad.net/~modred
>>>>>>>>>> Post to     : modred@xxxxxxxxxxxxxxxxxxx
>>>>>>>>>> Unsubscribe : https://launchpad.net/~modred
>>>>>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Mailing list: https://launchpad.net/~modred
>>>>>>>> Post to     : modred@xxxxxxxxxxxxxxxxxxx
>>>>>>>> Unsubscribe : https://launchpad.net/~modred
>>>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>>>
>>>>>> _______________________________________________
>>>>>> Mailing list: https://launchpad.net/~modred
>>>>>> Post to     : modred@xxxxxxxxxxxxxxxxxxx
>>>>>> Unsubscribe : https://launchpad.net/~modred
>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Mailing list: https://launchpad.net/~modred
>>>> Post to     : modred@xxxxxxxxxxxxxxxxxxx
>>>> Unsubscribe : https://launchpad.net/~modred
>>>> More help   : https://help.launchpad.net/ListHelp
>>>>
>>>
>>>
>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~modred
>> Post to     : modred@xxxxxxxxxxxxxxxxxxx
>> Unsubscribe : https://launchpad.net/~modred
>> More help   : https://help.launchpad.net/ListHelp
>>
>
>
> --
> Scott Lawrence
>
> Webmaster
> The Blair Robot Project
> Montgomery Blair High School
>


-- 
Scott Lawrence

Webmaster
The Blair Robot Project
Montgomery Blair High School
References

Ideas
From: Michael Cohen, 2009-12-28
Re: Ideas
From: Scott Lawrence, 2009-12-28
Re: Ideas
From: Michael Cohen, 2009-12-29
Re: Ideas
From: Frederic Koehler, 2009-12-29
Re: Ideas
From: Michael Cohen, 2009-12-29
Re: Ideas
From: Scott Lawrence, 2009-12-29
Re: Ideas
From: Michael Cohen, 2009-12-29
Re: Ideas
From: Scott Lawrence, 2009-12-29
Re: Ideas
From: Michael Cohen, 2009-12-29
Re: Ideas
From: Scott Lawrence, 2009-12-29