commonsense team mailing list archive
-
commonsense team
-
Mailing list archive
-
Message #00016
Re: [Bug 373398] [NEW] Effective 2 GB limit on blend input
The blend tensor only has to keep the input tensors around if you want
to adjust the blending factors. It could throw them out otherwise.
Likewise, the conversion to a CSCMatrix could be made to be
destructive. Or SVDLIBC could be ported to work on a Tensor directly.
Another option is actually storing the biggest tensors on disk, using
(gasp!) ZODB. This is actually efficient. Sorta.
We also have some low-hanging fruit: DictTensor is storing Python
objects, not raw integers.
-Ken
On Thu, May 7, 2009 at 3:46 PM, Rob Speer <rspeer@xxxxxxx> wrote:
> Public bug reported:
>
> The blending code currently multiplies all the input data, and puts it
> into a sparse matrix, before running the blend SVD.
>
> There may in fact be multiple copies of all the data: the original input
> tensors, the blend tensor, and the CSCMatrix.
>
> This quickly hits the 2GB memory limit in 32-bit Python (or,
> equivalently, quickly eats up 4GB or more of ram in 64-bit Python). We
> need a way to conserve memory. Some possibilities:
>
> * Incremental approaches (perhaps using Jayant's 'hit all the zeros at once' idea to make incremental SVD spiky like Lanczos SVD is)
> * SVD of SVDs (add the svd.u's together, not the input matrices, and svd again; sigma and v need to be reconstructed in other ways)
>
> ** Affects: divisi
> Importance: Medium
> Assignee: Rob Speer (rspeer)
> Status: Confirmed
>
>
> ** Tags: efficiency
>
> ** Changed in: divisi
> Importance: Undecided => Medium
>
> ** Changed in: divisi
> Status: New => Confirmed
>
> ** Changed in: divisi
> Assignee: (unassigned) => Rob Speer (rspeer)
>
> --
> Effective 2 GB limit on blend input
> https://bugs.launchpad.net/bugs/373398
> You received this bug notification because you are a member of
> Commonsense Computing, which is the registrant for Divisi.
>
> Status in Divisi: Confirmed
>
> Bug description:
> The blending code currently multiplies all the input data, and puts it into a sparse matrix, before running the blend SVD.
>
> There may in fact be multiple copies of all the data: the original input tensors, the blend tensor, and the CSCMatrix.
>
> This quickly hits the 2GB memory limit in 32-bit Python (or, equivalently, quickly eats up 4GB or more of ram in 64-bit Python). We need a way to conserve memory. Some possibilities:
>
> * Incremental approaches (perhaps using Jayant's 'hit all the zeros at once' idea to make incremental SVD spiky like Lanczos SVD is)
> * SVD of SVDs (add the svd.u's together, not the input matrices, and svd again; sigma and v need to be reconstructed in other ways)
>
> _______________________________________________
> Mailing list: https://launchpad.net/~commonsense
> Post to : commonsense@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~commonsense
> More help : https://help.launchpad.net/ListHelp
>
--
Effective 2 GB limit on blend input
https://bugs.launchpad.net/bugs/373398
You received this bug notification because you are a member of
Commonsense Computing, which is the registrant for Divisi.
Status in Divisi: Confirmed
Bug description:
The blending code currently multiplies all the input data, and puts it into a sparse matrix, before running the blend SVD.
There may in fact be multiple copies of all the data: the original input tensors, the blend tensor, and the CSCMatrix.
This quickly hits the 2GB memory limit in 32-bit Python (or, equivalently, quickly eats up 4GB or more of ram in 64-bit Python). We need a way to conserve memory. Some possibilities:
* Incremental approaches (perhaps using Jayant's 'hit all the zeros at once' idea to make incremental SVD spiky like Lanczos SVD is)
* SVD of SVDs (add the svd.u's together, not the input matrices, and svd again; sigma and v need to be reconstructed in other ways)
References