← Back to team overview

modred team mailing list archive

Re: Ideas

 

Um...Are we going to require that all clients run linux?

On Mon, Dec 28, 2009 at 11:08 PM, Michael Cohen <gnurdux@xxxxxxxxx> wrote:

> ptrace tells you whenever the process tried to make a system call.  You can
> then do whatever you want with that information, including recording it and
> passing it on to the kernel or doing your own action.
>
>
> Michael Cohen
>
> Scott Lawrence wrote:
>
>> Actually, not.  Ignore that last message.
>>
>> Can you build a prototype, that calls a specified function in place of
>> the kernel??
>>
>> On 12/28/09, Scott Lawrence <bytbox@xxxxxxxxx> wrote:
>>
>>> Ok.  I think that for the most part, we should block system calls.
>>>
>>> On 12/28/09, Michael Cohen <gnurdux@xxxxxxxxx> wrote:
>>>
>>>> Very little to not at all if the code doesn't make many system calls.  I
>>>> wouldn't expect it to make many anyway; the tasks that this is good for
>>>> shouldn't be ones that require much communication (because the Internet
>>>> is fairly slow; if it's always sending stuff and requiring responses
>>>> that gives probably a .1 second latency each step at least), so its
>>>> mostly just running on the CPU.  It would certainly add less overhead
>>>> for CPU-intensive things than say, Java.
>>>>
>>>> Michael Cohen
>>>>
>>>> Scott Lawrence wrote:
>>>>
>>>>> And this is the only thing that needs to be done? How much will it
>>>>> slow the code down? More importantly
>>>>>
>>>>>
>>>>> On 12/28/09, Michael Cohen <gnurdux@xxxxxxxxx> wrote:
>>>>>
>>>>>> We can't actually block interrupts; that require kernel mode code.
>>>>>> Also, I think there are other mechanisms for system calls.
>>>>>>
>>>>>> BUT
>>>>>>
>>>>>> lucky for us, Linux (and other unixes, but with slightly different
>>>>>> implementations) has a built-in way to intercept system calls.  It's
>>>>>> called ptrace, and it is what is used for the USACO sandbox.
>>>>>>
>>>>>> Michael Cohen
>>>>>>
>>>>>> Scott Lawrence wrote:
>>>>>>
>>>>>>> Oh. I see.
>>>>>>>
>>>>>>> My first instinct is to say: "ban them!"  But it would be really nice
>>>>>>> if most existing source code could run out-of-the-box on the cluster,
>>>>>>> even if there wouldn't be a speedup.
>>>>>>>
>>>>>>> I'm not planning on support C/C++ on windows - that's way too much
>>>>>>> trouble - so we only have to worry about unix systems.  Are
>>>>>>> interrupts
>>>>>>> the only things we would have to block?
>>>>>>>
>>>>>>> On 12/28/09, Michael Cohen <gnurdux@xxxxxxxxx> wrote:
>>>>>>>
>>>>>>>> You are simply incorrect here.  The issue isn't library calls, it's
>>>>>>>> system calls.  Libc calls themselves use system calls, which are
>>>>>>>> interrupts.  You can do everything without touching libc.  You just
>>>>>>>> do
>>>>>>>> the right stuff to the stuck and do an interrupt or whatever.  The
>>>>>>>> library doesn't have some special way to access the kernel.
>>>>>>>>
>>>>>>>> Michael Cohen
>>>>>>>>
>>>>>>>> Scott Lawrence wrote:
>>>>>>>>
>>>>>>>>> You're all missing the point.  I'm claiming that, properly
>>>>>>>>> implemented, Modred should require no sandboxing outside of what is
>>>>>>>>> necessary to implement it's logic.
>>>>>>>>>
>>>>>>>>> So back to our good friends Alice, Bob, and Mallory.  Alice sends
>>>>>>>>> the
>>>>>>>>> cluster (which means she directs it to the hub, but let's just
>>>>>>>>> consider the cluster a big black box for now) some C source code.
>>>>>>>>> This code does some strange stuff - lots of file i/o and memory
>>>>>>>>> access.  What does the cluster do with this?
>>>>>>>>>
>>>>>>>>> It links the program with its own special libraries.  Even inline
>>>>>>>>> assembly has to call functions to interface with the hard drive and
>>>>>>>>> allocate memory and such. Malicious code that gets submitted to the
>>>>>>>>> server will be sanitized in this fashion.  The only problem I see
>>>>>>>>> is
>>>>>>>>> with illegal memory access - but I suspect this will be dealt with,
>>>>>>>>> because the cluster has to analyze what data the client program
>>>>>>>>> accesses anyway...
>>>>>>>>>
>>>>>>>>> Now Bob wants to compile and link his program on his own computer.
>>>>>>>>> Fine.  He uses a different (smaller, incidentally) set of
>>>>>>>>> libraries.
>>>>>>>>> These libraries don't intercept every call of malloc and stuff -
>>>>>>>>> those
>>>>>>>>> are run on his computer.  But if he wants to access cluster data,
>>>>>>>>> he
>>>>>>>>> has to use special functions.  And he can't actually run code on
>>>>>>>>> the
>>>>>>>>> cluster.
>>>>>>>>>
>>>>>>>>> Now what does Mallory do again?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 12/28/09, Michael Cohen <gnurdux@xxxxxxxxx> wrote:
>>>>>>>>>
>>>>>>>>>> Server-side I don't see an issue.  (java's, lua's,
>>>>>>>>>>  >  javascript's, .NET/mono, some other random thing) is basically
>>>>>>>>>> what
>>>>>>>>>> I already said.  There are other sandboxing systems that are
>>>>>>>>>> designed
>>>>>>>>>> to
>>>>>>>>>> work on x86 native code, such as vx32 (I think I mentioned that
>>>>>>>>>> also).
>>>>>>>>>> Many of these schemes (with the exception of vx32) have the
>>>>>>>>>> advantage
>>>>>>>>>> that they also automatically make the code cross-platform.  Even
>>>>>>>>>> vx32
>>>>>>>>>> is
>>>>>>>>>> supposedly portable to Windows, but nobody has done it yet and I
>>>>>>>>>> have
>>>>>>>>>> no
>>>>>>>>>> idea if any of us have the expertise to.
>>>>>>>>>>
>>>>>>>>>> Frederic Koehler wrote:
>>>>>>>>>>
>>>>>>>>>>> As far as sandboxing, server-side you can presumably rely on the
>>>>>>>>>>> operating
>>>>>>>>>>> system's sandboxes (per-user or perhaps some more elaborate
>>>>>>>>>>> mechanism
>>>>>>>>>>> like
>>>>>>>>>>> FreeBSD's jails).
>>>>>>>>>>>
>>>>>>>>>>> But as soon as the cluster sends code out to clients, obviously
>>>>>>>>>>> there
>>>>>>>>>>> is
>>>>>>>>>>> a
>>>>>>>>>>> big issue if we let them do whatever the hell they want. Just
>>>>>>>>>>> preventing
>>>>>>>>>>> assembly or anything like that simply doesn't work in C/C++, (not
>>>>>>>>>>> to
>>>>>>>>>>> mention
>>>>>>>>>>> it would be suprisingly hard/irritating,) since the code could
>>>>>>>>>>> still
>>>>>>>>>>> execute
>>>>>>>>>>> the system-calls (you could try not linking against libc,too, but
>>>>>>>>>>> then
>>>>>>>>>>> you
>>>>>>>>>>> _really_ have no portability :P).
>>>>>>>>>>>
>>>>>>>>>>> System-call controlling is possible, but is either pretty
>>>>>>>>>>> unportable
>>>>>>>>>>> (lots
>>>>>>>>>>> of x86 assembly stuff) or slow-ish (virtual machines).
>>>>>>>>>>>
>>>>>>>>>>> That being said, if you completely seperate client-sendable code
>>>>>>>>>>> from
>>>>>>>>>>> server-code, I think that allays a lot of the concerns. Requiring
>>>>>>>>>>> client-sendable code to be written for some safe VM (java's,
>>>>>>>>>>> lua's,
>>>>>>>>>>>  javascript's, .NET/mono, some other random thing) could avoid
>>>>>>>>>>> this.
>>>>>>>>>>> In
>>>>>>>>>>> addition, client-sendable code would intentionally be written
>>>>>>>>>>> with
>>>>>>>>>>> knowledge
>>>>>>>>>>> of the sensitivity of the data it handles (i.e. not written at
>>>>>>>>>>> all
>>>>>>>>>>> if
>>>>>>>>>>> the
>>>>>>>>>>> data is important).
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Dec 28, 2009 at 7:49 PM, Michael Cohen <
>>>>>>>>>>> gnurdux@xxxxxxxxx>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>  I would still be happier if there were a sandbox, actually.
>>>>>>>>>>>> There
>>>>>>>>>>>> are
>>>>>>>>>>>> ways
>>>>>>>>>>>> of getting around that sort of thing that are too complicated to
>>>>>>>>>>>> prevent
>>>>>>>>>>>> at
>>>>>>>>>>>> the source level IMO.  For instance, you can use inline
>>>>>>>>>>>> assembly.
>>>>>>>>>>>> So
>>>>>>>>>>>> we
>>>>>>>>>>>> block inline assembly.  That's all well and good, but now we've
>>>>>>>>>>>> blocked
>>>>>>>>>>>> people using legitimate assembly optimizations. Worse, what
>>>>>>>>>>>> happens
>>>>>>>>>>>> if
>>>>>>>>>>>> they
>>>>>>>>>>>> execute some shellcody stuff, allowing them to escape?  I don't
>>>>>>>>>>>> really
>>>>>>>>>>>> know
>>>>>>>>>>>> how to block that at all.  On the other hand, a sandbox would
>>>>>>>>>>>> not
>>>>>>>>>>>> add
>>>>>>>>>>>> much
>>>>>>>>>>>> overhead since these tasks will most likely use lots of CPU time
>>>>>>>>>>>> but
>>>>>>>>>>>> few
>>>>>>>>>>>> system calls or whatever.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Michael Cohen
>>>>>>>>>>>>
>>>>>>>>>>>> Scott Lawrence wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>  Ok, I'm going to build a prototype of my privacy model.  I'm
>>>>>>>>>>>>> not
>>>>>>>>>>>>> going
>>>>>>>>>>>>> to implement the challenge-response stuff, I'll assume there's
>>>>>>>>>>>>> an
>>>>>>>>>>>>> implementation of that and that it works.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think I've isolated the misunderstanding about the sandboxes.
>>>>>>>>>>>>> You
>>>>>>>>>>>>> don't submit binary code the the Modred cluster - you either
>>>>>>>>>>>>> submit
>>>>>>>>>>>>> source, to be linked by the modred cluster with the relevant
>>>>>>>>>>>>> libraries, or you link the code yourself with the libraries.
>>>>>>>>>>>>> The
>>>>>>>>>>>>> libraries that you would link with merely copy the program over
>>>>>>>>>>>>> to
>>>>>>>>>>>>> the
>>>>>>>>>>>>> cluster, where it can be executed in a manner deemed fit by the
>>>>>>>>>>>>> code
>>>>>>>>>>>>> there.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I suppose you could say that that is a sandbox. ;-)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 12/28/09, Michael Cohen <gnurdux@xxxxxxxxx> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>  If you read my email more carefully, you will see that I am
>>>>>>>>>>>>>> not
>>>>>>>>>>>>>> necessary objecting to Scott's suggestion.  I say that it is
>>>>>>>>>>>>>> not
>>>>>>>>>>>>>> necessary, but that it would be the only thing necessary to
>>>>>>>>>>>>>> allow
>>>>>>>>>>>>>> more
>>>>>>>>>>>>>> problem-specific privacy tasks to be used.  The need for a
>>>>>>>>>>>>>> sandbox
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>> pretty simple.  If we make untrusted users able to ask for
>>>>>>>>>>>>>> tasks,
>>>>>>>>>>>>>> if
>>>>>>>>>>>>>> they upload code, then I don't want it running unsandboxed on
>>>>>>>>>>>>>> my
>>>>>>>>>>>>>> computer.  Otherwise, their code could steal my files, wipe my
>>>>>>>>>>>>>> harddisk,
>>>>>>>>>>>>>> install Windows or do other undesirable things.  If it is
>>>>>>>>>>>>>> sandboxed,
>>>>>>>>>>>>>> then arbitary code can be executed safely, as long as we trust
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> sandbox.  Sandboxed environments are often also
>>>>>>>>>>>>>> cross-platform,
>>>>>>>>>>>>>> another
>>>>>>>>>>>>>> plus, since they typically replace or intercept any kind of
>>>>>>>>>>>>>> system
>>>>>>>>>>>>>> call.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Michael Cohen
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Scott Lawrence wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  Well, I'm glad someone expresses opinions I don't agree
>>>>>>>>>>>>>>> with...
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think Mikey's objection to privacy concerns is that it's so
>>>>>>>>>>>>>>> problem-specific, we can't reasonably expect to have a
>>>>>>>>>>>>>>> general
>>>>>>>>>>>>>>> implementation.  But if the user specifies which parts of the
>>>>>>>>>>>>>>> data
>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>> private, the Modred hub just has to be sure to divvy up tasks
>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>> way
>>>>>>>>>>>>>>> that gives those bits of information only to the trusted,
>>>>>>>>>>>>>>> dedicated
>>>>>>>>>>>>>>> servers.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> For the purposes of clarity, I will be referring to dedicated
>>>>>>>>>>>>>>> servers
>>>>>>>>>>>>>>> as simply "servers", and the central server as the "hub".
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I don't see the need for a sandbox.  Could you present some
>>>>>>>>>>>>>>> specific
>>>>>>>>>>>>>>> attacks that a sandbox would fix?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 12/28/09, Michael Cohen <gnurdux@xxxxxxxxx> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  It seems to me that dealing with privacy concerns is an
>>>>>>>>>>>>>>>> extremely
>>>>>>>>>>>>>>>> problem-specific issue.  In any given case you need to work
>>>>>>>>>>>>>>>> out
>>>>>>>>>>>>>>>> how
>>>>>>>>>>>>>>>> much
>>>>>>>>>>>>>>>> you can give to people without letting private information
>>>>>>>>>>>>>>>> leak,
>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> details vary greatly from problem to problem.  That isn't
>>>>>>>>>>>>>>>> our
>>>>>>>>>>>>>>>> business,
>>>>>>>>>>>>>>>> and I don't think we should concern ourselves with it too
>>>>>>>>>>>>>>>> much.
>>>>>>>>>>>>>>>> The
>>>>>>>>>>>>>>>> way
>>>>>>>>>>>>>>>> I see it there are two options:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1. make this designed for stuff without privacy concerns
>>>>>>>>>>>>>>>>       I think this is both the easiest and the best option.
>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>> don't
>>>>>>>>>>>>>>>> really
>>>>>>>>>>>>>>>> like the idea of a public, free service doing computations
>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>> evil
>>>>>>>>>>>>>>>> corporation anyway; if it's being done BY the public it
>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>> done
>>>>>>>>>>>>>>>> FOR the public.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2. add in a small amount of functionality designed to
>>>>>>>>>>>>>>>> facilitate
>>>>>>>>>>>>>>>> dealing
>>>>>>>>>>>>>>>> with privacy concerns
>>>>>>>>>>>>>>>>       At the level of this project, that would probably just
>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> controls
>>>>>>>>>>>>>>>> on what data gets sent to what people.  There might be
>>>>>>>>>>>>>>>> reasons
>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>> adding such controls anyway; some tasks could be designated
>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>> "trusted" users.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Either way I doubt that this will be a big issue.  I think
>>>>>>>>>>>>>>>> maybe
>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>> bigger issue is how to run arbitrary code efficiently and
>>>>>>>>>>>>>>>> securely.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I see only a few solutions
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       Don't allow arbitrary code, but only a defined set of
>>>>>>>>>>>>>>>> tasks.
>>>>>>>>>>>>>>>>  Or,
>>>>>>>>>>>>>>>> similarly, allow some "trusted" set of tasks, each
>>>>>>>>>>>>>>>> separately
>>>>>>>>>>>>>>>> ported
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> each platform (like boinc).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       Use Java.  This lets us easily sandbox it and is
>>>>>>>>>>>>>>>> cross-platform,
>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>> sacrifices a bit on efficiency.  Also, Java can be annoying
>>>>>>>>>>>>>>>> (although
>>>>>>>>>>>>>>>> other JVM languages would also work in this situation).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>       There are ways of running cross-platform, C/C++ code
>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>> sandbox as
>>>>>>>>>>>>>>>> well.  One possibility is to use LLVM, although the LLVM
>>>>>>>>>>>>>>>> developers
>>>>>>>>>>>>>>>> specifically say that LLVM is NOT designed to be used this
>>>>>>>>>>>>>>>> way.
>>>>>>>>>>>>>>>>  Another
>>>>>>>>>>>>>>>> possibility is to use a sandboxed code system that works on
>>>>>>>>>>>>>>>> multiple
>>>>>>>>>>>>>>>> operating systems but only on x86.  This includes things
>>>>>>>>>>>>>>>> like
>>>>>>>>>>>>>>>> VX32,
>>>>>>>>>>>>>>>> which is apparently portable to Windows, but hasn't been
>>>>>>>>>>>>>>>> ported.
>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>> don't know whether or not that sort of thing is within our
>>>>>>>>>>>>>>>> abilities.
>>>>>>>>>>>>>>>> Another option might be Google Native Client; that is
>>>>>>>>>>>>>>>> designed
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>> used in a web browser but I don't know how hard it would be
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> "rip
>>>>>>>>>>>>>>>> out"
>>>>>>>>>>>>>>>> the sandboxing/cross-OS x86 code stuff.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Michael Cohen
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>> Mailing list: https://launchpad.net/~modred
>>>>>>>>>>>>>>>> Post to     : modred@xxxxxxxxxxxxxxxxxxx
>>>>>>>>>>>>>>>> Unsubscribe : https://launchpad.net/~modred
>>>>>>>>>>>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>  _______________________________________________
>>>>>>>>>>>>>> Mailing list: https://launchpad.net/~modred
>>>>>>>>>>>>>> Post to     : modred@xxxxxxxxxxxxxxxxxxx
>>>>>>>>>>>>>> Unsubscribe : https://launchpad.net/~modred
>>>>>>>>>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  _______________________________________________
>>>>>>>>>>>> Mailing list: https://launchpad.net/~modred
>>>>>>>>>>>> Post to     : modred@xxxxxxxxxxxxxxxxxxx
>>>>>>>>>>>> Unsubscribe : https://launchpad.net/~modred
>>>>>>>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>>>>>>>
>>>>>>>>>>>>  _______________________________________________
>>>>>>>>>> Mailing list: https://launchpad.net/~modred
>>>>>>>>>> Post to     : modred@xxxxxxxxxxxxxxxxxxx
>>>>>>>>>> Unsubscribe : https://launchpad.net/~modred
>>>>>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>>>>>
>>>>>>>>>>  _______________________________________________
>>>>>>>> Mailing list: https://launchpad.net/~modred
>>>>>>>> Post to     : modred@xxxxxxxxxxxxxxxxxxx
>>>>>>>> Unsubscribe : https://launchpad.net/~modred
>>>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>>>
>>>>>>>>  _______________________________________________
>>>>>> Mailing list: https://launchpad.net/~modred
>>>>>> Post to     : modred@xxxxxxxxxxxxxxxxxxx
>>>>>> Unsubscribe : https://launchpad.net/~modred
>>>>>> More help   : https://help.launchpad.net/ListHelp
>>>>>>
>>>>>>
>>>>>
>>>> _______________________________________________
>>>> Mailing list: https://launchpad.net/~modred
>>>> Post to     : modred@xxxxxxxxxxxxxxxxxxx
>>>> Unsubscribe : https://launchpad.net/~modred
>>>> More help   : https://help.launchpad.net/ListHelp
>>>>
>>>>
>>> --
>>> Scott Lawrence
>>>
>>> Webmaster
>>> The Blair Robot Project
>>> Montgomery Blair High School
>>>
>>>
>>
>>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~modred
> Post to     : modred@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~modred
> More help   : https://help.launchpad.net/ListHelp
>

Follow ups

References