commonsense team mailing list archive

Thread
Date
Re: [Bug 373398] [NEW] Effective 2 GB limit on blend input

To: commonsense@xxxxxxxxxxxxxxxxxxx
From: Ken Arnold <kenneth.arnold@xxxxxxxxx>
Date: Thu, 07 May 2009 21:38:55 -0000
Reply-to: Bug 373398 <373398@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
The blend tensor only has to keep the input tensors around if you want
to adjust the blending factors. It could throw them out otherwise.

Likewise, the conversion to a CSCMatrix could be made to be
destructive. Or SVDLIBC could be ported to work on a Tensor directly.

Another option is actually storing the biggest tensors on disk, using
(gasp!) ZODB. This is actually efficient. Sorta.

We also have some low-hanging fruit: DictTensor is storing Python
objects, not raw integers.

-Ken


On Thu, May 7, 2009 at 3:46 PM, Rob Speer <rspeer@xxxxxxx> wrote:
> Public bug reported:
>
> The blending code currently multiplies all the input data, and puts it
> into a sparse matrix, before running the blend SVD.
>
> There may in fact be multiple copies of all the data: the original input
> tensors, the blend tensor, and the CSCMatrix.
>
> This quickly hits the 2GB memory limit in 32-bit Python (or,
> equivalently, quickly eats up 4GB or more of ram in 64-bit Python). We
> need a way to conserve memory. Some possibilities:
>
> * Incremental approaches (perhaps using Jayant's 'hit all the zeros at once' idea to make incremental SVD spiky like Lanczos SVD is)
> * SVD of SVDs (add the svd.u's together, not the input matrices, and svd again; sigma and v need to be reconstructed in other ways)
>
> ** Affects: divisi
>     Importance: Medium
>     Assignee: Rob Speer (rspeer)
>         Status: Confirmed
>
>
> ** Tags: efficiency
>
> ** Changed in: divisi
>   Importance: Undecided => Medium
>
> ** Changed in: divisi
>       Status: New => Confirmed
>
> ** Changed in: divisi
>     Assignee: (unassigned) => Rob Speer (rspeer)
>
> --
> Effective 2 GB limit on blend input
> https://bugs.launchpad.net/bugs/373398
> You received this bug notification because you are a member of
> Commonsense Computing, which is the registrant for Divisi.
>
> Status in Divisi: Confirmed
>
> Bug description:
> The blending code currently multiplies all the input data, and puts it into a sparse matrix, before running the blend SVD.
>
> There may in fact be multiple copies of all the data: the original input tensors, the blend tensor, and the CSCMatrix.
>
> This quickly hits the 2GB memory limit in 32-bit Python (or, equivalently, quickly eats up 4GB or more of ram in 64-bit Python). We need a way to conserve memory. Some possibilities:
>
> * Incremental approaches (perhaps using Jayant's 'hit all the zeros at once' idea to make incremental SVD spiky like Lanczos SVD is)
> * SVD of SVDs (add the svd.u's together, not the input matrices, and svd again; sigma and v need to be reconstructed in other ways)
>
> _______________________________________________
> Mailing list: https://launchpad.net/~commonsense
> Post to     : commonsense@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~commonsense
> More help   : https://help.launchpad.net/ListHelp
>

-- 
Effective 2 GB limit on blend input
https://bugs.launchpad.net/bugs/373398
You received this bug notification because you are a member of
Commonsense Computing, which is the registrant for Divisi.

Status in Divisi: Confirmed

Bug description:
The blending code currently multiplies all the input data, and puts it into a sparse matrix, before running the blend SVD.

There may in fact be multiple copies of all the data: the original input tensors, the blend tensor, and the CSCMatrix.

This quickly hits the 2GB memory limit in 32-bit Python (or, equivalently, quickly eats up 4GB or more of ram in 64-bit Python). We need a way to conserve memory. Some possibilities:

* Incremental approaches (perhaps using Jayant's 'hit all the zeros at once' idea to make incremental SVD spiky like Lanczos SVD is)
* SVD of SVDs (add the svd.u's together, not the input matrices, and svd again; sigma and v need to be reconstructed in other ways)
References

[Bug 373398] [NEW] Effective 2 GB limit on blend input
From: Rob Speer, 2009-05-07