Affichage des articles dont le libellé est lists. Afficher tous les articles
Affichage des articles dont le libellé est lists. Afficher tous les articles

vendredi 24 janvier 2014

Re: generate De Bruijn sequence memory and string vs lists topic




On Fri, Jan 24, 2014 at 2:23 AM, Peter Otten <(E-Mail Removed)> wrote:

> Then, how do you think Python /knows/ that it has to repeat the code 10
> times on my "slow" and 100 times on your "fast" machine? It runs the bench
> once, then 10, then 100, then 1000 times -- until there's a run that takes
> 0.2 secs or more. The total expected minimum time without startup overhead
> is then
>


​Ah, I did not know about the calibration. That and I did not notice the
100 on my machine vs 10 on yours.​


Vincent Davis






jeudi 23 janvier 2014

Re: generate De Bruijn sequence memory and string vs lists topic




Vincent Davis <(E-Mail Removed)> Wrote in message:
>


I didn't really study the code, and the fact that there's a
nested function could mess it up. But if it were a
straightforward function with exactly one append, , then
replacing the append with a yield would produce the string one
character at a time.


--
DaveA






Re: generate De Bruijn sequence memory and string vs lists topic




On Thu, Jan 23, 2014 at 3:15 PM, Peter Otten <(E-Mail Removed)> wrote:

> $ python -m timeit -s 'from debruijn_compat import debruijn as d' 'd(4, 8)'
> 10 loops, best of 3: 53.5 msec per loop
> $ python -m timeit -s 'from debruijn_compat import debruijn_bytes as d'
> 'd(4, 8)'
> 10 loops, best of 3: 22.2 msec per loop
> $ python3 -m timeit -s 'from debruijn_compat import debruijn as d' 'd(4,
> 8)'
> 10 loops, best of 3: 68 msec per loop
> $ python3 -m timeit -s 'from debruijn_compat import debruijn_bytes as d'
> 'd(4, 8)'
> 10 loops, best of 3: 21.7 msec per loop
>


Excellent Peter!
I have a question, the times reported don't make sense to me, for example
$ python3 -m timeit -s 'from debruijn_compat import debruijn_bytes as d'
'd(4, 8)'
100 loops, best of 3: 10.2 msec per loop
This took ~4 secs (stop watch) which is much more that 10*.0102 Why is this?

$ python3 -m timeit -s 'from debruijn_compat import debruijn_bytes as d'
'd(4, 11)'
10 loops, best of 3: 480 msec per loop​
This took ~20 secs vs .480*10

d(4, 14) takes about 24 seconds (one run)

Vincent Davis






Re: generate De Bruijn sequence memory and string vs lists topic




Vincent Davis wrote:

> On Thu, Jan 23, 2014 at 2:36 PM, Mark Lawrence
> <(E-Mail Removed)>wrote:
>
>> FTR string.maketrans is gone from Python 3.2+. Quoting from
>> http://docs.python.org/dev/whatsnew/...-to-python-3-2 "The
>> previously deprecated string.maketrans() function has been removed in
>> favor of the static methods bytes.maketrans() and bytearray.maketrans().
>> This change solves the confusion around which types were supported by the
>> string module. Now, str, bytes, and bytearray each have their own
>> maketrans and translate methods with intermediate translation tables of
>> the appropriate type."
>>

>
> ​Thanks for pointing this out Mark, ​I will soon be running this on 3.3+


Well, my first post in this thread head this suspicious comment:

> # Python 2
> def debruijn(k, n):


In hindsight I have no idea what I was trying to say ;)

Anyway, as a special service to Mark and Vincent here's an updated version
that might work on both Python 2 and 3 (there's no test but the ad-hoc demo
in the if __name__ == "__main__" block):

[debruijn is Vincents original code, debruijn_bytes my modified version]

$ cat debruijn_compat.py
def debruijn(k, n):
"""
De Bruijn sequence for alphabet size k (0,1,2...k-1)
and subsequences of length n.
From wikipedia Sep 22 2013
"""
a = [0] * k * n
sequence = []
def db(t, p,):
if t > n:
if n % p == 0:
for j in range(1, p + 1):
sequence.append(a[j])
else:
a[t] = a[t - p]
db(t + 1, p)
for j in range(int(a[t - p]) + 1, k):
a[t] = j
db(t + 1, t)
db(1, 1)
return ''.join(map(str, sequence))

_mapping = bytearray(b"?")*256
_mapping[:10] = b"0123456789"

def debruijn_bytes(k, n):
a = k * n * bytearray([0])
sequence = bytearray()
extend = sequence.extend
def db(t, p):
if t > n:
if n % p == 0:
extend(a[1: p+1])
else:
a[t] = a[t - p]
db(t + 1, p)
for j in range(a[t - p] + 1, k):
a[t] = j
db(t + 1, t)
db(1, 1)
return sequence.translate(_mapping).decode("ascii")

if __name__ == "__main__":
d1 = debruijn(4, 8)
d2 = debruijn_bytes(4, 8)

print(d1[:50])
print(d2[:50])
assert d1 == d2

$ python debruijn_compat.py
00000000100000002000000030000001100000012000000130
00000000100000002000000030000001100000012000000130
$ python3 debruijn_compat.py
00000000100000002000000030000001100000012000000130
00000000100000002000000030000001100000012000000130
$ python -m timeit -s 'from debruijn_compat import debruijn as d' 'd(4, 8)'
10 loops, best of 3: 53.5 msec per loop
$ python -m timeit -s 'from debruijn_compat import debruijn_bytes as d'
'd(4, 8)'
10 loops, best of 3: 22.2 msec per loop
$ python3 -m timeit -s 'from debruijn_compat import debruijn as d' 'd(4, 8)'
10 loops, best of 3: 68 msec per loop
$ python3 -m timeit -s 'from debruijn_compat import debruijn_bytes as d'
'd(4, 8)'
10 loops, best of 3: 21.7 msec per loop







Re: generate De Bruijn sequence memory and string vs lists topic




Vincent Davis wrote:

> On Thu, Jan 23, 2014 at 12:02 PM, Peter Otten <(E-Mail Removed)> wrote:
>>
>> I just noted that the first Python loop can be eliminated:



Oops, I forgot to paste

import string
def chars(a, b):
return "".join(map(chr, range(a, b)))
_mapping = string.maketrans(chars(0, 10), chars(48, 58))


>> def debruijn(k, n):
>> a = k * n * bytearray([0])
>> sequence = bytearray()
>> extend = sequence.extend # factor out method lookup
>> def db(t, p):
>> if t > n:
>> if n % p == 0:
>> extend(a[1: p+1])
>> else:
>> a[t] = a[t - p]
>> db(t + 1, p)
>> for j in xrange(a[t - p] + 1, k):
>> a[t] = j
>> db(t + 1, t)
>> db(1, 1)
>> return sequence.translate(_mapping)

>
>
> I am not really sure what _mapping should be. The code above does not run
> because
> NameError: global name '_mapping' is not defined
> I tried to get the bytearray
> ​ ​
> sequence to convert to ascii but don't know how to.


It does the same as adding 48 to every byte in the first variant:

>>> import string
>>> def chars(a, b):

.... return "".join(map(chr, range(a, b)))
....
>>> _mapping = string.maketrans(chars(0, 10), chars(48, 58))
>>> b = bytearray(range(10))
>>> b

bytearray(b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t ')
>>> b.translate(_mapping)

bytearray(b'0123456789')

The disadvantage of this approach is that it produces a new bytearray, i. e.
it increases the peak amount of memory used by the function significantly.






Re: generate De Bruijn sequence memory and string vs lists topic




On 23/01/2014 20:10, Peter Otten wrote:
> Vincent Davis wrote:
>
>> On Thu, Jan 23, 2014 at 12:02 PM, Peter Otten <(E-Mail Removed)> wrote:
>>>
>>> I just noted that the first Python loop can be eliminated:

>
>
> Oops, I forgot to paste
>
> import string
> def chars(a, b):
> return "".join(map(chr, range(a, b)))
> _mapping = string.maketrans(chars(0, 10), chars(48, 58))
>


FTR string.maketrans is gone from Python 3.2+. Quoting from
http://docs.python.org/dev/whatsnew/...-to-python-3-2 "The
previously deprecated string.maketrans() function has been removed in
favor of the static methods bytes.maketrans() and bytearray.maketrans().
This change solves the confusion around which types were supported by
the string module. Now, str, bytes, and bytearray each have their own
maketrans and translate methods with intermediate translation tables of
the appropriate type."

--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence






Re: generate De Bruijn sequence memory and string vs lists topic




On Thu, Jan 23, 2014 at 2:36 PM, Mark Lawrence <(E-Mail Removed)>wrote:

> FTR string.maketrans is gone from Python 3.2+. Quoting from
> http://docs.python.org/dev/whatsnew/...-to-python-3-2 "The
> previously deprecated string.maketrans() function has been removed in favor
> of the static methods bytes.maketrans() and bytearray.maketrans(). This
> change solves the confusion around which types were supported by the string
> module. Now, str, bytes, and bytearray each have their own maketrans and
> translate methods with intermediate translation tables of the appropriate
> type."
>


​Thanks for pointing this out Mark, ​I will soon be runningthis on 3.3+


Vincent Davis
720-301-3003






Re: generate De Bruijn sequence memory and string vs lists topic




On Thu, Jan 23, 2014 at 12:02 PM, Peter Otten <(E-Mail Removed)> wrote:
>
> I just noted that the first Python loop can be eliminated:
>
> def debruijn(k, n):
> a = k * n * bytearray([0])
> sequence = bytearray()
> extend = sequence.extend # factor out method lookup
> def db(t, p):
> if t > n:
> if n % p == 0:
> extend(a[1: p+1])
> else:
> a[t] = a[t - p]
> db(t + 1, p)
> for j in xrange(a[t - p] + 1, k):
> a[t] = j
> db(t + 1, t)
> db(1, 1)
> return sequence.translate(_mapping)



I am not really sure what _mapping should be. The code above does not run
because
NameError: global name '_mapping' is not defined
I tried to get the bytearray
​ ​
sequence to convert to ascii but don't know how to.


Vincent Davis






Re: generate De Bruijn sequence memory and string vs lists topic




Peter Otten wrote:

> You could change de_bruijn_1() to use `bytearray`s instead of `list`s:
>
> # Python 2
> def debruijn(k, n):
> a = k * n * bytearray([0])
> sequence = bytearray()
> append = sequence.append # factor out method lookup
> def db(t, p,):
> if t > n:
> if n % p == 0:
> for j in xrange(1, p + 1):
> append(a[j]+48) # add 48 to convert to ascii
> else:
> a[t] = a[t - p]
> db(t + 1, p)
> for j in xrange(a[t - p] + 1, k):
> a[t] = j
> db(t + 1, t)
> db(1, 1)
> return sequence


I just noted that the first Python loop can be eliminated:

def debruijn(k, n):
a = k * n * bytearray([0])
sequence = bytearray()
extend = sequence.extend # factor out method lookup
def db(t, p):
if t > n:
if n % p == 0:
extend(a[1: p+1])
else:
a[t] = a[t - p]
db(t + 1, p)
for j in xrange(a[t - p] + 1, k):
a[t] = j
db(t + 1, t)
db(1, 1)
return sequence.translate(_mapping)







Re: generate De Bruijn sequence memory and string vs lists topic




On Thu, Jan 23, 2014 at 10:18 AM, Dave Angel <(E-Mail Removed)> wrote:

> If memory size is your issue, why not make the function a
> generator, by replacing the append with a yield?
>


​One more thought on the generator. I have an idea for how to use the
generator but I still need 1, chucks of size n de_brujin(k, n) and the
ordering the same ordering as found in ​de_brujin(k, n).
I am not really sure how to modify the algorithm to do that. Any ideas? I
won't have time to think hard about that until later.


Vincent Davis
720-301-3003






Re: generate De Bruijn sequence memory and string vs lists topic




Vincent Davis wrote:

> For reference, Wikipedia entry for De Bruijn sequence
> http://en.wikipedia.org/wiki/De_Bruijn_sequence
>
> At the above link is a python algorithm for generating De Brujin
> sequences. It works fine but outputs a list of integers [0, 0, 0, 1, 0, 1,
> 1, 1] and I would prefer a string '00010111'. This can be accomplished by
> changing the last line from;
> return sequence
> to
> return ''.join([str(i) for i in sequence])
> See de_bruijn_1 Below.
>
> The other option would be to manipulate strings directly (kind of).
> I butchered the original algorithm to do this. See de_bruijn_2 below. But
> it is much slower and ungly.
>
> I am wanting to make a few large De Bruijin sequences. hopefully on the
> order of de_bruijn(4, 50) to de_bruijn(4, 100) (wishful thinking?). I
> don't know the limits (memory or time) for the current algorithms. I think
> I am will hit the memory mazsize limit at about 4^31. The system I will be
> using has 64GB RAM.
> The size of a De Brujin sequence is k^n
>
> My questions;
> 1, de_bruijn_2 is ugly, any suggestions to do it better?
> 2, de_bruijn_2 is significantly slower than de_bruijn_1. Speedups?
> 3, Any thought on which is more memory efficient during computation.
>
> #### 1 ####
> def de_bruijn_1(k, n):
> """
> De Bruijn sequence for alphabet size k (0,1,2...k-1)
> and subsequences of length n.
> From wikipedia Sep 22 2013
> """
> a = [0] * k * n
> sequence = []
> def db(t, p,):
> if t > n:
> if n % p == 0:
> for j in range(1, p + 1):
> sequence.append(a[j])
> else:
> a[t] = a[t - p]
> db(t + 1, p)
> for j in range(int(a[t - p]) + 1, k):
> a[t] = j
> db(t + 1, t)
> db(1, 1)
> #return sequence #original
> return ''.join([str(i) for i in sequence])
>
> d1 = de_bruijn_1(4, 8)
>
> #### 2 ####
> def de_bruijn_2(k, n):
> global sequence
> a = '0' * k * n
> sequence = ''
> def db(t, p):
> global sequence
> global a
> if t > n:
> if n % p == 0:
> for j in range(1, p + 1):
> sequence = sequence + a[j]
> else:
> a = a[:t] + a[t - p] + a[t+1:]
> db(t + 1, p)
> for j in range(int(a[t - p]) + 1, k):
> a = a[:t] + str(j) + a[t+1:]
> db(t + 1, t)
> return sequence
> db(1, 1)
> return sequence
>
> d2 = de_bruijn_2(4, 8)


You could change de_bruijn_1() to use `bytearray`s instead of `list`s:

# Python 2
def debruijn(k, n):
a = k * n * bytearray([0])
sequence = bytearray()
append = sequence.append # factor out method lookup
def db(t, p,):
if t > n:
if n % p == 0:
for j in xrange(1, p + 1):
append(a[j]+48) # add 48 to convert to ascii
else:
a[t] = a[t - p]
db(t + 1, p)
for j in xrange(a[t - p] + 1, k):
a[t] = j
db(t + 1, t)
db(1, 1)
return sequence







Re:generate De Bruijn sequence memory and string vs lists topic




Vincent Davis <(E-Mail Removed)> Wrote in message:
>

(something about your message seems to make it unquotable)

64gig is 4^18, so you can forget about holding a string of size 4^50

If memory size is your issue, why not make the function a
generator, by replacing the append with a yield?


--
DaveA






generate De Bruijn sequence memory and string vs lists topic




For reference, Wikipedia entry for De Bruijn sequence
http://en.wikipedia.org/wiki/De_Bruijn_sequence

At the above link is a python algorithm for generating De Brujin sequences.
It works fine but outputs a list of integers [0, 0, 0, 1, 0, 1, 1, 1] and I
would prefer a string '00010111'. This can be accomplished by changing the
last line from;
return sequence
to
return ''.join([str(i) for i in sequence])
See de_bruijn_1 Below.

The other option would be to manipulate strings directly (kind of).
I butchered the original algorithm to do this. See de_bruijn_2 below. But
it is much slower and ungly.

I am wanting to make a few large De Bruijin sequences. hopefully on the
order of de_bruijn(4, 50) to de_bruijn(4, 100) (wishful thinking?). I don't
know the limits (memory or time) for the current algorithms. I think I am
will hit the memory mazsize limit at about 4^31. The system I will be using
has 64GB RAM.
The size of a De Brujin sequence is k^n

My questions;
1, de_bruijn_2 is ugly, any suggestions to do it better?
2, de_bruijn_2 is significantly slower than de_bruijn_1. Speedups?
3, Any thought on which is more memory efficient during computation.

#### 1 ####
def de_bruijn_1(k, n):
"""
De Bruijn sequence for alphabet size k (0,1,2...k-1)
and subsequences of length n.
From wikipedia Sep 22 2013
"""
a = [0] * k * n
sequence = []
def db(t, p,):
if t > n:
if n % p == 0:
for j in range(1, p + 1):
sequence.append(a[j])
else:
a[t] = a[t - p]
db(t + 1, p)
for j in range(int(a[t - p]) + 1, k):
a[t] = j
db(t + 1, t)
db(1, 1)
#return sequence #original
return ''.join([str(i) for i in sequence])

d1 = de_bruijn_1(4, 8)

#### 2 ####
def de_bruijn_2(k, n):
global sequence
a = '0' * k * n
sequence = ''
def db(t, p):
global sequence
global a
if t > n:
if n % p == 0:
for j in range(1, p + 1):
sequence = sequence + a[j]
else:
a = a[:t] + a[t - p] + a[t+1:]
db(t + 1, p)
for j in range(int(a[t - p]) + 1, k):
a = a[:t] + str(j) + a[t+1:]
db(t + 1, t)
return sequence
db(1, 1)
return sequence

d2 = de_bruijn_2(4, 8)


Vincent Davis