pyexcel - Let you focus on data, instead of file formats
Support the project
If your company has embedded pyexcel and its components into a revenue generating product, please support me on patreon or bounty source to maintain the project and develop it further.
If you are an individual, you are welcome to support me too and for however long you feel like. As my backer, you will receive early access to pyexcel related contents.
And your issues will get prioritized if you would like to become my patreon as pyexcel pro user.
With your financial support, I will be able to invest a little bit more time in coding, documentation and writing interesting posts.
Known constraints
Fonts, colors and charts are not supported.
Introduction
Feature Highlights
- One application programming interface(API) to handle multiple data sources:
- physical file
- memory file
- SQLAlchemy table
- Django Model
- Python data structures: dictionary, records and array
- One API to read and write data in various excel file formats.
- For large data sets, data streaming are supported. A genenerator can be returned to you. Checkout iget_records, iget_array, isave_as and isave_book_as.
Installation
You can install pyexcel via pip:
$ pip install pyexcel
or clone it and install it:
$ git clone https://github.com/pyexcel/pyexcel.git
$ cd pyexcel
$ python setup.py install
Usage
Please note that you will have to use '.sortable.html' in order to replicate the example.
>>> # pip install pyexcel-text==0.2.7.1 >>> import pyexcel as p >>> ccs_insight2 = p.Sheet() >>> ccs_insight2.name = "Worldwide Mobile Phone Shipments (Billions), 2017-2021" >>> ccs_insight2.ndjson = """ ... {"year": ["2017", "2018", "2019", "2020", "2021"]} ... {"smart phones": [1.53, 1.64, 1.74, 1.82, 1.90]} ... {"feature phones": [0.46, 0.38, 0.30, 0.23, 0.17]} ... """.strip() >>> ccs_insight2 pyexcel sheet: +----------------+------+------+------+------+------+ | year | 2017 | 2018 | 2019 | 2020 | 2021 | +----------------+------+------+------+------+------+ | smart phones | 1.53 | 1.64 | 1.74 | 1.82 | 1.9 | +----------------+------+------+------+------+------+ | feature phones | 0.46 | 0.38 | 0.3 | 0.23 | 0.17 | +----------------+------+------+------+------+------+
Suppose you have the following data in a dictionary:
Name | Age |
---|---|
Adam | 28 |
Beatrice | 29 |
Ceri | 30 |
Dean | 26 |
you can easily save it into an excel file using the following code:
>>> import pyexcel >>> # make sure you had pyexcel-xls installed >>> a_list_of_dictionaries = [ ... { ... "Name": 'Adam', ... "Age": 28 ... }, ... { ... "Name": 'Beatrice', ... "Age": 29 ... }, ... { ... "Name": 'Ceri', ... "Age": 30 ... }, ... { ... "Name": 'Dean', ... "Age": 26 ... } ... ] >>> pyexcel.save_as(records=a_list_of_dictionaries, dest_file_name="your_file.xls")
And here's how to obtain the records:
>>> import pyexcel as p >>> records = p.iget_records(file_name="your_file.xls") >>> for record in records: ... print("%s is aged at %d" % (record['Name'], record['Age'])) Adam is aged at 28 Beatrice is aged at 29 Ceri is aged at 30 Dean is aged at 26 >>> p.free_resources()
Advanced usage :fire:
If you are dealing with big data, please consider these usages:
>>> def increase_everyones_age(generator): ... for row in generator: ... row['Age'] += 1 ... yield row >>> def duplicate_each_record(generator): ... for row in generator: ... yield row ... yield row >>> records = p.iget_records(file_name="your_file.xls") >>> io=p.isave_as(records=duplicate_each_record(increase_everyones_age(records)), ... dest_file_type='csv', dest_lineterminator='\n') >>> print(io.getvalue()) Age,Name 29,Adam 29,Adam 30,Beatrice 30,Beatrice 31,Ceri 31,Ceri 27,Dean 27,Dean <BLANKLINE>
Two advantages of above method:
- Add as many wrapping functions as you want.
- Constant memory consumption
Available Plugins
Package name | Supported file formats | Dependencies | Python versions |
---|---|---|---|
pyexcel-io | csv, csvz [1], tsv, tsvz [2] | 2.6, 2.7, 3.3, 3.4, 3.5, 3.6 pypy | |
pyexcel-xls | xls, xlsx(read only), xlsm(read only) | xlrd, xlwt | same as above |
pyexcel-xlsx | xlsx | openpyxl | same as above |
pyexcel-ods3 | ods | pyexcel-ezodf, lxml | 2.6, 2.7, 3.3, 3.4 3.5, 3.6 |
pyexcel-ods | ods | odfpy | same as above |
Package name | Supported file formats | Dependencies | Python versions |
---|---|---|---|
pyexcel-xlsxw | xlsx(write only) | XlsxWriter | Python 2 and 3 |
pyexcel-xlsxr | xlsx(read only) | lxml | same as above |
pyexcel-xlsbr | xlsx(read only) | pyxlsb | same as above |
pyexcel-odsr | read only for ods, fods | lxml | same as above |
pyexcel-odsw | write only for ods | loxun | same as above |
pyexcel-htmlr | html(read only) | lxml,html5lib | same as above |
pyexcel-pdfr | pdf(read only) | pdftables | Python 2 only. |
Package name | Supported file formats | Dependencies | Python versions |
---|---|---|---|
pyexcel-text | write only:rst, mediawiki, html, latex, grid, pipe, orgtbl, plain simple read only: ndjson r/w: json | tabulate | 2.6, 2.7, 3.3, 3.4 3.5, 3.6, pypy |
pyexcel-handsontable | handsontable in html | handsontable | same as above |
pyexcel-pygal | svg chart | pygal | 2.7, 3.3, 3.4, 3.5 3.6, pypy |
pyexcel-sortable | sortable table in html | csvtotable | same as above |
pyexcel-gantt | gantt chart in html | frappe-gantt | except pypy, same as above |
In order to manage the list of plugins installed, you need to use pip to add or remove a plugin. When you use virtualenv, you can have different plugins per virtual environment. In the situation where you have multiple plugins that does the same thing in your environment, you need to tell pyexcel which plugin to use per function call. For example, pyexcel-ods and pyexcel-odsr, and you want to get_array to use pyexcel-odsr. You need to append get_array(..., library='pyexcel-odsr').
Footnotes
[1] | zipped csv file |
[2] | zipped tsv file |
Acknowledgement
All great work have been done by odf, ezodf, xlrd, xlwt, tabulate and other individual developers. This library unites only the data access code.
License
New BSD License
Commit History @47f45822-2af7-43ca-80ca-59d28789cb19/main
- Update standards version to 4.5.0, no changes needed. Kali Janitor 3 years ago
- Set upstream metadata fields: Bug-Database, Bug-Submit, Repository, Repository-Browse. Kali Janitor 3 years ago
- Remove Python 2 module Sophie Brun 4 years ago
- Update URL in GitLab's CI configuration file Raphaël Hertzog 4 years ago
- Configure git-buildpackage for Kali Raphaël Hertzog 4 years ago
- Add GitLab's CI configuration file Raphaël Hertzog 4 years ago
- Run the tests during the build Sophie Brun 4 years ago
- Update debian/changelog Sophie Brun 4 years ago
- Add a patch to fix typo Sophie Brun 4 years ago
- Update dependencies Sophie Brun 4 years ago