================================================================================
pyexcel - Let you focus on data, instead of file formats
================================================================================
.. image:: https://raw.githubusercontent.com/pyexcel/pyexcel.github.io/master/images/patreon.png
:target: https://www.patreon.com/pyexcel
.. image:: https://api.bountysource.com/badge/team?team_id=288537
:target: https://salt.bountysource.com/teams/chfw-pyexcel
.. image:: https://travis-ci.org/pyexcel/pyexcel.svg?branch=master
:target: http://travis-ci.org/pyexcel/pyexcel
.. image:: https://codecov.io/gh/pyexcel/pyexcel/branch/master/graph/badge.svg
:target: https://codecov.io/gh/pyexcel/pyexcel
.. image:: https://img.shields.io/gitter/room/gitterHQ/gitter.svg
:target: https://gitter.im/pyexcel/Lobby
.. image:: https://readthedocs.org/projects/pyexcel/badge/?version=latest
:target: http://pyexcel.readthedocs.org/en/latest/
Support the project
================================================================================
If your company has embedded pyexcel and its components into a revenue generating
product, please support me on `patreon <https://www.patreon.com/bePatron?u=5537627>`_
or `bounty source <https://salt.bountysource.com/teams/chfw-pyexcel>`_ to maintain
the project and develop it further.
If you are an individual, you are welcome to support me too and for however long
you feel like. As my backer, you will receive
`early access to pyexcel related contents <https://www.patreon.com/pyexcel/posts>`_.
And your issues will get prioritized if you would like to become my patreon as `pyexcel pro user`.
With your financial support, I will be able to invest
a little bit more time in coding, documentation and writing interesting posts.
Known constraints
==================
Fonts, colors and charts are not supported.
Introduction
================================================================================
Feature Highlights
===================
1. One application programming interface(API) to handle multiple data sources:
* physical file
* memory file
* SQLAlchemy table
* Django Model
* Python data structures: dictionary, records and array
2. One API to read and write data in various excel file formats.
3. For large data sets, data streaming are supported. A genenerator can be returned to you. Checkout iget_records, iget_array, isave_as and isave_book_as.
Installation
================================================================================
You can install pyexcel via pip:
.. code-block:: bash
$ pip install pyexcel
or clone it and install it:
.. code-block:: bash
$ git clone https://github.com/pyexcel/pyexcel.git
$ cd pyexcel
$ python setup.py install
Usage
===============
Please note that you will have to use '.sortable.html' in order to replicate the example.
.. image:: https://github.com/pyexcel/pyexcel-sortable/raw/master/sortable.gif
.. code-block:: python
>>> # pip install pyexcel-text==0.2.7.1
>>> import pyexcel as p
>>> ccs_insight2 = p.Sheet()
>>> ccs_insight2.name = "Worldwide Mobile Phone Shipments (Billions), 2017-2021"
>>> ccs_insight2.ndjson = """
... {"year": ["2017", "2018", "2019", "2020", "2021"]}
... {"smart phones": [1.53, 1.64, 1.74, 1.82, 1.90]}
... {"feature phones": [0.46, 0.38, 0.30, 0.23, 0.17]}
... """.strip()
>>> ccs_insight2
pyexcel sheet:
+----------------+------+------+------+------+------+
| year | 2017 | 2018 | 2019 | 2020 | 2021 |
+----------------+------+------+------+------+------+
| smart phones | 1.53 | 1.64 | 1.74 | 1.82 | 1.9 |
+----------------+------+------+------+------+------+
| feature phones | 0.46 | 0.38 | 0.3 | 0.23 | 0.17 |
+----------------+------+------+------+------+------+
Suppose you have the following data in a dictionary:
========= ====
Name Age
========= ====
Adam 28
Beatrice 29
Ceri 30
Dean 26
========= ====
you can easily save it into an excel file using the following code:
.. code-block:: python
>>> import pyexcel
>>> # make sure you had pyexcel-xls installed
>>> a_list_of_dictionaries = [
... {
... "Name": 'Adam',
... "Age": 28
... },
... {
... "Name": 'Beatrice',
... "Age": 29
... },
... {
... "Name": 'Ceri',
... "Age": 30
... },
... {
... "Name": 'Dean',
... "Age": 26
... }
... ]
>>> pyexcel.save_as(records=a_list_of_dictionaries, dest_file_name="your_file.xls")
And here's how to obtain the records:
.. code-block:: python
>>> import pyexcel as p
>>> records = p.iget_records(file_name="your_file.xls")
>>> for record in records:
... print("%s is aged at %d" % (record['Name'], record['Age']))
Adam is aged at 28
Beatrice is aged at 29
Ceri is aged at 30
Dean is aged at 26
>>> p.free_resources()
Advanced usage :fire:
----------------------
If you are dealing with big data, please consider these usages:
>>> def increase_everyones_age(generator):
... for row in generator:
... row['Age'] += 1
... yield row
>>> def duplicate_each_record(generator):
... for row in generator:
... yield row
... yield row
>>> records = p.iget_records(file_name="your_file.xls")
>>> io=p.isave_as(records=duplicate_each_record(increase_everyones_age(records)),
... dest_file_type='csv', dest_lineterminator='\n')
>>> print(io.getvalue())
Age,Name
29,Adam
29,Adam
30,Beatrice
30,Beatrice
31,Ceri
31,Ceri
27,Dean
27,Dean
<BLANKLINE>
Two advantages of above method:
#. Add as many wrapping functions as you want.
#. Constant memory consumption
Available Plugins
=================
.. _file-format-list:
.. _a-map-of-plugins-and-file-formats:
.. table:: A list of file formats supported by external plugins
======================== ======================= ================= ==================
Package name Supported file formats Dependencies Python versions
======================== ======================= ================= ==================
`pyexcel-io`_ csv, csvz [#f1]_, tsv, 2.6, 2.7, 3.3,
tsvz [#f2]_ 3.4, 3.5, 3.6
pypy
`pyexcel-xls`_ xls, xlsx(read only), `xlrd`_, same as above
xlsm(read only) `xlwt`_
`pyexcel-xlsx`_ xlsx `openpyxl`_ same as above
`pyexcel-ods3`_ ods `pyexcel-ezodf`_, 2.6, 2.7, 3.3, 3.4
lxml 3.5, 3.6
`pyexcel-ods`_ ods `odfpy`_ same as above
======================== ======================= ================= ==================
.. table:: Dedicated file reader and writers
======================== ======================= ================= ==================
Package name Supported file formats Dependencies Python versions
======================== ======================= ================= ==================
`pyexcel-xlsxw`_ xlsx(write only) `XlsxWriter`_ Python 2 and 3
`pyexcel-xlsxr`_ xlsx(read only) lxml same as above
`pyexcel-xlsbr`_ xlsx(read only) pyxlsb same as above
`pyexcel-odsr`_ read only for ods, fods lxml same as above
`pyexcel-odsw`_ write only for ods loxun same as above
`pyexcel-htmlr`_ html(read only) lxml,html5lib same as above
`pyexcel-pdfr`_ pdf(read only) pdftables Python 2 only.
======================== ======================= ================= ==================
.. _pyexcel-io: https://github.com/pyexcel/pyexcel-io
.. _pyexcel-xls: https://github.com/pyexcel/pyexcel-xls
.. _pyexcel-xlsx: https://github.com/pyexcel/pyexcel-xlsx
.. _pyexcel-ods: https://github.com/pyexcel/pyexcel-ods
.. _pyexcel-ods3: https://github.com/pyexcel/pyexcel-ods3
.. _pyexcel-odsr: https://github.com/pyexcel/pyexcel-odsr
.. _pyexcel-odsw: https://github.com/pyexcel/pyexcel-odsw
.. _pyexcel-pdfr: https://github.com/pyexcel/pyexcel-pdfr
.. _pyexcel-xlsxw: https://github.com/pyexcel/pyexcel-xlsxw
.. _pyexcel-xlsxr: https://github.com/pyexcel/pyexcel-xlsxr
.. _pyexcel-xlsbr: https://github.com/pyexcel/pyexcel-xlsbr
.. _pyexcel-htmlr: https://github.com/pyexcel/pyexcel-htmlr
.. _xlrd: https://github.com/python-excel/xlrd
.. _xlwt: https://github.com/python-excel/xlwt
.. _openpyxl: https://bitbucket.org/openpyxl/openpyxl
.. _XlsxWriter: https://github.com/jmcnamara/XlsxWriter
.. _pyexcel-ezodf: https://github.com/pyexcel/pyexcel-ezodf
.. _odfpy: https://github.com/eea/odfpy
.. table:: Other data renderers
======================== ======================= ================= ==================
Package name Supported file formats Dependencies Python versions
======================== ======================= ================= ==================
`pyexcel-text`_ write only:rst, `tabulate`_ 2.6, 2.7, 3.3, 3.4
mediawiki, html, 3.5, 3.6, pypy
latex, grid, pipe,
orgtbl, plain simple
read only: ndjson
r/w: json
`pyexcel-handsontable`_ handsontable in html `handsontable`_ same as above
`pyexcel-pygal`_ svg chart `pygal`_ 2.7, 3.3, 3.4, 3.5
3.6, pypy
`pyexcel-sortable`_ sortable table in html `csvtotable`_ same as above
`pyexcel-gantt`_ gantt chart in html `frappe-gantt`_ except pypy, same
as above
======================== ======================= ================= ==================
.. _pyexcel-text: https://github.com/pyexcel/pyexcel-text
.. _tabulate: https://bitbucket.org/astanin/python-tabulate
.. _pyexcel-handsontable: https://github.com/pyexcel/pyexcel-handsontable
.. _handsontable: https://cdnjs.com/libraries/handsontable
.. _pyexcel-pygal: https://github.com/pyexcel/pyexcel-chart
.. _pygal: https://github.com/Kozea/pygal
.. _pyexcel-matplotlib: https://github.com/pyexcel/pyexcel-matplotlib
.. _matplotlib: https://matplotlib.org
.. _pyexcel-sortable: https://github.com/pyexcel/pyexcel-sortable
.. _csvtotable: https://github.com/vividvilla/csvtotable
.. _pyexcel-gantt: https://github.com/pyexcel/pyexcel-gantt
.. _frappe-gantt: https://github.com/frappe/gantt
In order to manage the list of plugins installed, you need to use pip to add or remove
a plugin. When you use virtualenv, you can have different plugins per virtual
environment. In the situation where you have multiple plugins that does the same thing
in your environment, you need to tell pyexcel which plugin to use per function call.
For example, pyexcel-ods and pyexcel-odsr, and you want to get_array to use pyexcel-odsr.
You need to append get_array(..., library='pyexcel-odsr').
.. rubric:: Footnotes
.. [#f1] zipped csv file
.. [#f2] zipped tsv file
Acknowledgement
===============
All great work have been done by odf, ezodf, xlrd, xlwt, tabulate and other
individual developers. This library unites only the data access code.
.. testcode::
:hide:
>>> import os
>>> os.unlink("your_file.xls")
License
================================================================================
New BSD License