NumPy on PyPy - Progress in February
More progress was made on the NumPy front in the past month. On the compatibility front, we now pass ~130 more tests from NumPy's suite since the end of January. Currently, we pass 2336 tests out of 3265 tests run, with many of the failures representing portions of NumPy that we don't plan to implement in the near future (object dtypes, unicode, etc). There are still some failures that do represent issues, such as special indexing cases and failures to respect subclassed ndarrays in return values, which we do plan to resolve. There are also some unimplemented components and ufuncs remaining which we hope to implement, such as nditer and mtrand. Overall, the most common array functionality should be working.
Additionally, I began to take a look at some of the loops generated by our code. One widely used loop is dot, and we were running about 5x slower than NumPy's C version. I was able to optimize the dot loop and also the general array iterator to get us to ~1.5x NumPy C time on dot operations of various sizes. Further progress in this area could be made by using CFFI to tie into BLAS libraries, when available. Also, work remains in examining traces generated for our other loops and checking for potential optimizations.
To try out PyPy + NumPy, grab a nightly PyPy and install our NumPy fork. Feel free to report comments/issues to IRC, our mailing list, or bug tracker. Thanks to the contributors to the NumPy on PyPy proposal for supporting this work.
Cheers,
Brian
Py3k status update #13
This is the 13th status update about our work on the py3k branch, which we
can work on thanks to all of the people who donated to the py3k proposal.
We're just finishing up a cleanup of int/long types. This work helps the py3k
branch unify these types into the Python 3 int and restore JIT compilation of
machine sized integers.
This cleanup also removes multimethods from these types. PyPy has
historically used a clever implementation of multimethod dispatch for declaring
methods of the __builtin__ types in RPython.
This multimethod scheme provides some convenient features for doing this,
however we've come to the conclusion that it may be more trouble than it's
worth. A major problem of multimethods is that they generate a large amount of
stub methods which burden the already lengthy and memory hungry RPython
translation process. Also, their implementation and behavior can be somewhat
complicated/obscure.
The alternative to multimethods involves doing the work of the type checking
and dispatching rules in a more verbose, manual way. It's a little more work in
the end but less magical.
Recently, Manuel Jacob finished a large cleanup effort of the
unicode/string/bytearray types that also removed their multimethods. This work
also benefits the py3k branch: it'll help with future PEP 393 (or PEP 393
alternative) work. This effort was partly sponsored by Google's Summer of
Code: thanks Manuel and Google!
Now there's only a couple major pieces left in the multimethod removal (the
float/complex types and special marshaling code) and a few minor pieces that
should be relatively easy.
In conclusion, there's been some good progress made on py3k and multimethod
removal this winter, albeit a bit slower than we would have liked.
cheers,
Phil
The str/unicode/bytearray refactoring is not completely done yet.
Rewrites of the STM core model -- again
Hi all,
A quick note about the Software Transactional Memory (STM) front.
Since the previous post, we believe we progressed a lot by discovering an alternative core model for software transactions. Why do I say "believe"? It's because it means again that we have to rewrite from scratch the C library handling STM. This is currently work in progress. Once this is done, we should be able to adapt the existing pypy-stm to run on top of it without much rewriting efforts; in fact it should simplify the difficult issues we ran into for the JIT. So while this is basically yet another restart similar to last June's, the difference is that the work that we have already put in the PyPy part (as opposed to the C library) remains.
You can read about the basic ideas of this new C library here. It is still STM-only, not HTM, but because it doesn't constantly move objects around in memory, it would be easier to adapt an HTM version. There are even potential ideas about a hybrid TM, like using HTM but only to speed up the commits. It is based on a Linux-only system call, remap_file_pages() (poll: who heard about it before? :-). As previously, the work is done by Remi Meier and myself.
Currently, the C library is incomplete, but early experiments show good results in running duhton, the interpreter for a minimal language created for the purpose of testing STM. Good results means we brough down the slow-downs from 60-80% (previous version) to around 15% (current version). This number measures the slow-down from the non-STM-enabled to the STM-enabled version, on one CPU core; of course, the idea is that the STM version scales up when using more than one core.
This means that we are looking forward to a result that is much better than originally predicted. The pypy-stm has chances to run at a one-thread speed that is only "n%" slower than the regular pypy-jit, for a value of "n" that is optimistically 15 --- but more likely some number around 25 or 50. This is seriously better than the original estimate, which was "between 2x and 5x". It would mean that using pypy-stm is quite worthwhile even with just two cores.
More updates later...
Armin
Did you consider existing STM libraries in your implementation? It might be worthwhile to take a look at stasis (https://code.google.com/p/stasis/) which has a pretty complete set of features.
https://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-2.pdf
Statis is not really applicable here: it's a Transactional Storage system, which despite the attempt of this paper to generalize it, is not going to apply successfully in the context of PyPy.
More comments on Hacker News.
poll response: I've heard of remap_file_pages! :)
I was wondering how to use this call when I learnt of it, but couldn't figure anything out except possibly database applications (similar) and sort algorithms (too limited). I think this call may be used when manipulating framebuffer too, there was something about having multiple mappings [to hardware] some readonly, some not.
I would like to [possibly] disagree with your statement in c7 README "Most probably, this comes with no overhead once the change is done..."
TLB cache is a limited resource and may easily be contended on large systems. Regular mmap could [in theory] use huge TLB pages, remapped individual pages cannot.
In addition there is a small penalty during first access to the remapped page, though you may consider it amortized depending on remap/reuse ratio.
Granted it's still small stuff.
Reserving one register is is a cool trick, and I find quite acceptable. It too has a small penalty, but the benefits surely outweigh those!
@Dina: Thanks for the feedback! Note that "%gs" is a special register that is usually not used: there is no direct way to read/write its actual value. It needs to be done with a syscall, at least before very recent CPUs. It can only be used in addressing instructions as an additional offset.
NumPy Status Update - December/January
Work continued on the NumPy + PyPy front steadily in December and more lightly in January. The continued focus was compatibility, targeting incorrect or unimplemented features that appeared in multiple NumPy test suite failures. We now pass ~2/3 of the NumPy test suite. The biggest improvements were made in these areas:
- Bugs in conversions of arrays/scalars to/from native types
- Fix cases where we would choose incorrect dtypes when initializing or computing results
- Improve handling of subclasses of ndarray through computations
- Support some optional arguments for array methods that are used in the pure-python part of NumPy
- Support additional attributes in arrays, array.flags, and dtypes
- Fix some indexing corner cases that arise in NumPy testing
- Implemented part of numpy.fft (cffti and cfftf)
Looking forward, we plan to continue improving the correctness of the existing implemented NumPy functionality, while also beginning to look at performance. The initial focus for performance will be to look at areas where we are significantly worse than CPython+NumPy. Those interested in trying these improvements out will need a PyPy nightly, and an install of the PyPy NumPy fork. Thanks again to the NumPy on PyPy donors for funding this work.
Many thanks for your work! Looking forward to support a full functionality of numpy in pypy!
> We now pass ~2/3 of the NumPy test suite.
Is the test coverage of numpy high enough so that a 100% green numpypy can be considered a full port? (Honest question, I have no background information suggesting the opposite.)
I can't wait to use Numpypy to speed up scientific analysis.
Are there any updates on using numpypy with a plotting package such as matplotlib?
NumPy Status Update - November
Since the PyPy 2.2 release last month, more progress has been made on the NumPy compatibility front. Initial work has been directed by running the NumPy test suite and targeting failures that appear most frequently, along with fixing the few bugs reported on the bug tracker.
Improvements were made in these areas:
- Many missing/broken scalar functionalities were added/fixed. The scalar API should match up more closely with arrays now.
- Some missing dtype functionality was added (newbyteorder, hasobject, descr, etc)
- Support for optional arguments (axis, order) was added to some ndarray functions
- Fixed some corner cases for string/record types
Most of these improvements went onto trunk after 2.2 was split, so if you're interested in trying them out or running into problems on 2.2, try the
nightly.
Thanks again to the NumPy on PyPy donors who make this continued progress possible.
Cheers,
Brian
PyGame CFFI
One of the RaspberryPi's goals is to be a fun toolkit for school children (and adults!) to learn programming and electronics with. Python and pygame are part of this toolkit. Recently the RaspberryPi Foundation funded parts of the effort of porting of pypy to the Pi -- making Python programs on the Pi faster!
Unfortunately pygame is written as a Python C extension that wraps SDL which means performance of pygame under pypy remains mediocre. To fix this pygame needs to be rewritten using cffi to wrap SDL instead.
RaspberryPi sponsored a CTPUG (Cape Town Python User Group) hackathon to put together a proof-of-concept pygame-cffi. The day was quite successful - we got a basic version of the bub'n'bros client working on pygame-cffi (and on PyPy). The results can be found on github with contributions from the five people present at the sprint.
While far from complete, the proof of concept does show that there are no major obstacles to porting pygame to cffi and that cffi is a great way to bind your Python package to C libraries.
Amazingly, we managed to have machines running all three major platforms (OS X, Linux and Windows) at the hackathon so the code runs on all of them!
We would like to thank the Praekelt foundation for providing the venue and The Raspberry Pi foundation for providing food and drinks!
Cheers,
Simon Cross, Jeremy Thurgood, Neil Muller, David Sharpe and fijal.
first of all pygame depends on SDL 1. Second ctypes kinda suck and I don't quite buy it's stability (especially with changing APIs, though it can be less of an issue with SDL). It's also slow on pypy
Ah, ok. Very nice work anyway. It's impressive what you all managed to get done in the sprint :)
Here's some information from pygame land about where the project is heading.
SDL 1 is the past, and the SDL developers are no longer putting out releases. However, I think many people will continue to patch it up for many years. SDL 2 is the future and after many years finally has a release out (2 now). pysdl2 is part of the future of pygame. pysdl2 matches the SDL 2 API as closely as possible. A pygame API ontop of pysdl2 is the future of pygame.
ctypes is no good for some platforms like iOS, and the web and pypy apparently. Although note, that pysdl2 already 'works' on top of pypy.
https://bitbucket.org/marcusva/py-sdl2/
https://pysdl2.readthedocs.org/en/latest/
Happy hacking :)
Amazing - you consider a messy cffi implementation (sometimes it builds on platform X, sometimes it does not, sometimes it works, sometimes it does not) a better choice over ctypes?
@Anonymous - your comment is pretty loaded, but we do think cffi is better than ctypes on all platforms, that's why we came up with cffi in the first place. I think cffi FAQ contains an answer to that.
@Rene: if pysdl2 is a bare-metal ctypes wrapper, writing a similar cffi wrapper instead should be very straightforward (even more than the current pygame-cffi). But do you know if pygame is really going that route, and if so, how soon?
I've been looking at cffi since it was first mentioned on our Pygame mailing list. It does look promising. I see only two, buffer related, issues that need to be resolved.
First, PyPy lacks an array export mechanism comparable to the CPython PEP 3113 buffer protocol. Instead, only the NumPy Array Interface, version: 3 is available. Though Pygame supports both the Python and C sides of the interface, it relies on CPython's reference counting for timely buffer release [1]. Periodic garbage collection is too unpredictable.
Second, the cffi module does not support CPython api function calls. So a cffi Pygame could not support the buffer protocol on CPython.
A possible solution to the first issue is for PyPy to use an extended array interface that includes a PEP 3118 like buffer release callback. I am working to resolve the second issue: [Issue13797] Allow objects implemented in pure Python to export PEP 3118 buffers.
[1] Add PEP 3118 (new) buffer support to Pygame surfaces
Hm, I can't get this to work on Ubuntu 12.04 doing the following
virtualenv -p /usr/bin/pypy pypy
cd pypy
source bin/activate
pip install git+https://github.com/eliben/pycparser.git
pip install hg+https://github.com/eliben/pycparser.git
pip install hg+https://foss.heptapod.net/cffi/cffi
git clone https://github.com/CTPUG/pygame_cffi.git
cd pygame_cffi/
pypy
import pygame
>>>> import pygame
Traceback (most recent call last):
File "", line 1, in
File "pygame/__init__.py", line 9, in
from pygame.color import Color
File "pygame/color.py", line 3, in
from pygame._sdl import ffi, sdl
File "pygame/_sdl.py", line 6, in
ffi = cffi.FFI()
File "/home/me/Documents/python/pygame/pypy/site-packages/cffi/api.py", line 56, in __init__
import _cffi_backend as backend
ImportError: No module named _cffi_backend
dpkg -l pypy
...
ii pypy 1.8+dfsg-2 fast alternative implementation of Python - PyPy interpreter
Do I need a newer pypy? Am I missing something else?
great! what's current status of it? I really can't wait to use Pygame on a PI through pypy.
PyPy Leysin Winter Sprint (11-19st January 2014)
The next PyPy sprint will be in Leysin, Switzerland, for the ninth time. This is a fully public sprint: newcomers and topics other than those proposed below are welcome.
Goals and topics of the sprint
- Py3k: work towards supporting Python 3 in PyPy
- NumPyPy: work towards supporting the numpy module in PyPy
- STM: work towards supporting Software Transactional Memory
- And as usual, the main side goal is to have fun in winter sports :-) We can take a day off for ski.
Exact times
For a change, and as an attempt to simplify things, I specified the dates as 11-19 January 2014, where 11 and 19 are travel days. We will work full days between the 12 and the 18. You are of course allowed to show up for a part of that time only, too.
Location & Accomodation
Leysin, Switzerland, "same place as before". Let me refresh your memory: both the sprint venue and the lodging will be in a very spacious pair of chalets built specifically for bed & breakfast: https://www.ermina.ch/. The place has a good ADSL Internet connexion with wireless installed. You can of course arrange your own lodging anywhere (as long as you are in Leysin, you cannot be more than a 15 minutes walk away from the sprint venue), but I definitely recommend lodging there too -- you won't find a better view anywhere else (though you probably won't get much worse ones easily, either :-)
Please confirm that you are coming so that we can adjust the reservations as appropriate. The rate so far has been around 60 CHF a night all included in 2-person rooms, with breakfast. There are larger rooms too (less expensive per person) and maybe the possibility to get a single room if you really want to.
Please register by Mercurial:
https://bitbucket.org/pypy/extradoc/ https://foss.heptapod.net/pypy/extradoc/-/blob/branch/default/extradoc/sprintinfo/leysin-winter-2014
or on the pypy-dev mailing list if you do not yet have check-in rights:
https://mail.python.org/mailman/listinfo/pypy-dev
You need a Swiss-to-(insert country here) power adapter. There will be some Swiss-to-EU adapters around -- bring a EU-format power strip if you have one.
PyPy 2.2.1 - Incrementalism.1
We're pleased to announce PyPy 2.2.1, which targets version 2.7.3 of the Python language. This is a bugfix release over 2.2.
You can download the PyPy 2.2.1 release here:
https://pypy.org/download.html
What is PyPy?
PyPy is a very compliant Python interpreter, almost a drop-in replacement for CPython 2.7. It's fast (pypy 2.2 and cpython 2.7.2 performance comparison) due to its integrated tracing JIT compiler.
This release supports x86 machines running Linux 32/64, Mac OS X 64, Windows 32, or ARM (ARMv6 or ARMv7, with VFPv3).
Work on the native Windows 64 is still stalling, we would welcome a volunteer to handle that.
Highlights
This is a bugfix release. The most important bugs fixed are:
- an issue in sockets' reference counting emulation, showing up notably when using the ssl module and calling makefile().
- Tkinter support on Windows.
- If sys.maxunicode==65535 (on Windows and maybe OS/X), the json decoder incorrectly decoded surrogate pairs.
- some FreeBSD fixes.
Note that CFFI 0.8.1 was released. Both versions 0.8 and 0.8.1 are compatible with both PyPy 2.2 and 2.2.1.
Cheers, Armin Rigo & everybody
CFFI 0.8
Hi all,
CFFI 0.8 for CPython (2.6-3.x) has been released.
Quick download: pip install cffi --upgrade
Documentation: https://cffi.readthedocs.org/en/release-0.8/
What's new: a number of small fixes; ffi.getwinerror()
; integrated support for C99 variable-sized structures; multi-thread safety.
--- Armin
Update: CFFI 0.8.1, with fixes on Python 3 on OS/X, and some FreeBSD fixes (thanks Tobias).
NumPy status update
The biggest change is that we shifted to using an external fork of numpy rather than a minimal numpypy module. The idea is that we will be able to reuse most of the upstream pure-python numpy components, replacing the C modules with appropriate RPython micronumpy pieces at the correct places in the module namespace.
The numpy fork should work just as well as the old numpypy for functionality that existed previously, and also include much new functionality from the pure-python numpy pieces that simply hadn't been imported yet in numpypy. However, this new functionality will not have been "hand picked" to only include pieces that work, so you may run into functionality that relies on unimplemented components (which should fail with user-level exceptions).
This setup also allows us to run the entire numpy test suite, which will help in directing future compatibility development. The recent PyPy release includes these changes, so download it and let us know how it works! And if you want to live on the edge, the nightly includes even more numpy progress made in November.
To install the fork, download the latest release, and then install numpy either separately with a virtualenv: pip install git+https://bitbucket.org/pypy/numpy.git; or directly: git clone https://bitbucket.org/pypy/numpy.git; cd numpy; pypy setup.py install.
EDIT: if you install numpy as root, you may need to also import it once as root before it works: sudo pypy -c 'import numpy'
Along with this change, progress was made in fixing internal micronumpy bugs and increasing compatibility:
- Fixed a bug with strings in record dtypes
- Fixed a bug where the multiplication of an ndarray with a Python int or float resulted in loss of the array's dtype
- Fixed several segfaults encountered in the numpy test suite (suite should run now without segfaulting)
We also began working on __array_prepare__ and __array_wrap__, which are necessary pieces for a working matplotlib module.
Cheers,
Romain and Brian
Hi,
Thanks for all your efforts on pypy-*, we really appreciate it!
I'm trying to compile numpy with pypy-2.2-osx64 but the building process (manual and pip) fails with:
AttributeError: 'module' object has no attribute 'get_makefile_filename'
Full build log: https://pastebin.com/S4dybCV0
Any idea how to resolve this?
Thanks,
t
Hey
Please put such reports to bugs.pypy.org so they don't get lost.
Thanks!
fijal
I am getting an error when installing numpy for pypy 2.2.1:
https://stackoverflow.com/questions/22342769/error-when-installing-numpy-for-pypy2-2-1
Thanks! It would be easier to repost this if the title contained pypy: "numpy in pypy - progress in February"
It would be great if the first performance optimizations where actually wrapper to BLAS, there is outstanding BSD license BLAS at https://github.com/xianyi/OpenBLAS
I believe the "performance optimizations" mentioned in the blog post are unrelated to BLAS. BLAS is about calling an external library. You can't optimize that, you just merely call it. The performance optimizations are about things like computing the matrix "a + b + c", which can be done without computing the intermediate result "a + b".
Armin, I agree with you. What I'm trying to say is that maybe to make the BLAS interface is going to be very easy, give great performance and people will use it most of the time if you bundle it.