Codebase list python-magic-ahupp / 624bcdfc-c970-49cc-8a4b-1ad0a7d36d12/upstream
Import upstream version 0.4.18 Kali Janitor 3 years ago
23 changed file(s) with 682 addition(s) and 324 deletion(s). Raw diff Collapse all Expand all
+0
-2
.gitignore less more
0 deb_dist
1 python_magic.egg-info
+0
-27
.travis.yml less more
0 language: python
1
2 # needed to use trusty
3 sudo: required
4
5 dist: trusty
6
7 python:
8 - "2.6"
9 - "2.7"
10 - "3.3"
11 - "3.4"
12 - "3.5"
13 - "3.6"
14 - "nightly"
15
16 install:
17 - pip install coveralls
18 - pip install codecov
19 - python setup.py install
20
21 script:
22 - coverage run setup.py test
23
24 after_success:
25 - coveralls
26 - codecov
00 include *.py
11 include LICENSE
2 include test/testdata/*
3 include test/*.sh
2 graft tests
3 global-exclude __pycache__
4 global-exclude *.py[co]
0 Metadata-Version: 2.1
1 Name: python-magic
2 Version: 0.4.18
3 Summary: File type identification using libmagic
4 Home-page: http://github.com/ahupp/python-magic
5 Author: Adam Hupp
6 Author-email: [email protected]
7 License: MIT
8 Description: # python-magic
9 [![PyPI version](https://badge.fury.io/py/python-magic.svg)](https://badge.fury.io/py/python-magic)
10 [![Build Status](https://travis-ci.org/ahupp/python-magic.svg?branch=master)](https://travis-ci.org/ahupp/python-magic)
11
12 python-magic is a Python interface to the libmagic file type
13 identification library. libmagic identifies file types by checking
14 their headers according to a predefined list of file types. This
15 functionality is exposed to the command line by the Unix command
16 `file`.
17
18 ## Usage
19
20 ```python
21 >>> import magic
22 >>> magic.from_file("testdata/test.pdf")
23 'PDF document, version 1.2'
24 # recommend using at least the first 2048 bytes, as less can produce incorrect identification
25 >>> magic.from_buffer(open("testdata/test.pdf").read(2048))
26 'PDF document, version 1.2'
27 >>> magic.from_file("testdata/test.pdf", mime=True)
28 'application/pdf'
29 ```
30
31 There is also a `Magic` class that provides more direct control,
32 including overriding the magic database file and turning on character
33 encoding detection. This is not recommended for general use. In
34 particular, it's not safe for sharing across multiple threads and
35 will fail throw if this is attempted.
36
37 ```python
38 >>> f = magic.Magic(uncompress=True)
39 >>> f.from_file('testdata/test.gz')
40 'ASCII text (gzip compressed data, was "test", last modified: Sat Jun 28
41 21:32:52 2008, from Unix)'
42 ```
43
44 You can also combine the flag options:
45
46 ```python
47 >>> f = magic.Magic(mime=True, uncompress=True)
48 >>> f.from_file('testdata/test.gz')
49 'text/plain'
50 ```
51
52 ## Installation
53
54 The current stable version of python-magic is available on PyPI and
55 can be installed by running `pip install python-magic`.
56
57 Other sources:
58
59 - PyPI: http://pypi.python.org/pypi/python-magic/
60 - GitHub: https://github.com/ahupp/python-magic
61
62 This module is a simple wrapper around the libmagic C library, and
63 that must be installed as well:
64
65 ### Debian/Ubuntu
66
67 $ sudo apt-get install libmagic1
68
69 ### Windows
70
71 You'll need DLLs for libmagic. @julian-r has uploaded a version of this project that includes binaries to PyPI:
72 https://pypi.python.org/pypi/python-magic-bin/0.4.14
73
74 Other sources of the libraries in the past have been [File for Windows](http://gnuwin32.sourceforge.net/packages/file.htm) . You will need to copy the file `magic` out of `[binary-zip]\share\misc`, and pass its location to `Magic(magic_file=...)`.
75
76 If you are using a 64-bit build of python, you'll need 64-bit libmagic binaries which can be found here: https://github.com/pidydx/libmagicwin64. Newer version can be found here: https://github.com/nscaife/file-windows.
77
78
79 ### OSX
80
81 - When using Homebrew: `brew install libmagic`
82 - When using macports: `port install file`
83
84 ### Troubleshooting
85
86 - 'MagicException: could not find any magic files!': some
87 installations of libmagic do not correctly point to their magic
88 database file. Try specifying the path to the file explicitly in the
89 constructor: `magic.Magic(magic_file="path_to_magic_file")`.
90
91 - 'WindowsError: [Error 193] %1 is not a valid Win32 application':
92 Attempting to run the 32-bit libmagic DLL in a 64-bit build of
93 python will fail with this error. Here are 64-bit builds of libmagic for windows: https://github.com/pidydx/libmagicwin64
94
95 - 'WindowsError: exception: access violation writing 0x00000000 ' This may indicate you are mixing
96 Windows Python and Cygwin Python. Make sure your libmagic and python builds are consistent.
97
98
99 ## Bug Reports
100
101 python-magic is a thin layer over the libmagic C library.
102 Historically, most bugs that have been reported against python-magic
103 are actually bugs in libmagic; libmagic bugs can be reported on their
104 tracker here: https://bugs.astron.com/my_view_page.php. If you're not
105 sure where the bug lies feel free to file an issue on GitHub and I can
106 triage it.
107
108 ## Running the tests
109
110 To run the tests across 3 recent Ubuntu LTS releases (depends on Docker):
111
112 $ ./test_docker.sh
113
114 To run tests locally across all available python versions:
115
116 $ ./test/run.py
117
118 To run against a specific python version:
119
120 $ LC_ALL=en_US.UTF-8 python3 test/test.py
121
122 ## Versioning
123
124 Minor version bumps should be backwards compatible. Major bumps are not.
125
126 ## Name Conflict
127
128 There are, sadly, two libraries which use the module name `magic`.
129 Both have been around for quite a while. If you are using this module
130 and get an error using a method like `open`, your code is expecting
131 the other one. Hopefully one day these will be reconciled.
132
133
134 ## Author
135
136 Written by Adam Hupp in 2001 for a project that never got off the
137 ground. It originally used SWIG for the C library bindings, but
138 switched to ctypes once that was part of the python standard library.
139
140 You can contact me via my [website](http://hupp.org/adam) or
141 [GitHub](http://github.com/ahupp).
142
143 ## Contributors
144
145 Thanks to these folks on github who submitted features and bug fixes.
146
147 - Amit Sethi
148 - [bigben87](https://github.com/bigben87)
149 - [fallgesetz](https://github.com/fallgesetz)
150 - [FlaPer87](https://github.com/FlaPer87)
151 - [Hugo van Kemenade](https://github.com/hugovk)
152 - [lukenowak](https://github.com/lukenowak)
153 - NicolasDelaby
154 - [email protected]
155 - SimpleSeb
156 - [tehmaze](https://github.com/tehmaze)
157
158 ## License
159
160 python-magic is distributed under the MIT license. See the included
161 LICENSE file for details.
162
163 I am providing code in the repository to you under an open source license. Because this is my personal repository, the license you receive to my code is from me and not my employer (Facebook).
164
165
166 Keywords: mime magic file
167 Platform: UNKNOWN
168 Classifier: Intended Audience :: Developers
169 Classifier: License :: OSI Approved :: MIT License
170 Classifier: Programming Language :: Python
171 Classifier: Programming Language :: Python :: 2
172 Classifier: Programming Language :: Python :: 2.7
173 Classifier: Programming Language :: Python :: 3
174 Classifier: Programming Language :: Python :: 3.5
175 Classifier: Programming Language :: Python :: 3.6
176 Classifier: Programming Language :: Python :: 3.7
177 Classifier: Programming Language :: Python :: 3.8
178 Classifier: Programming Language :: Python :: Implementation :: CPython
179 Requires-Python: >=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*
180 Description-Content-Type: text/markdown
11 [![PyPI version](https://badge.fury.io/py/python-magic.svg)](https://badge.fury.io/py/python-magic)
22 [![Build Status](https://travis-ci.org/ahupp/python-magic.svg?branch=master)](https://travis-ci.org/ahupp/python-magic)
33
4 python-magic is a python interface to the libmagic file type
4 python-magic is a Python interface to the libmagic file type
55 identification library. libmagic identifies file types by checking
66 their headers according to a predefined list of file types. This
77 functionality is exposed to the command line by the Unix command
1313 >>> import magic
1414 >>> magic.from_file("testdata/test.pdf")
1515 'PDF document, version 1.2'
16 >>> magic.from_buffer(open("testdata/test.pdf").read(1024))
16 # recommend using at least the first 2048 bytes, as less can produce incorrect identification
17 >>> magic.from_buffer(open("testdata/test.pdf").read(2048))
1718 'PDF document, version 1.2'
1819 >>> magic.from_file("testdata/test.pdf", mime=True)
1920 'application/pdf'
4041 'text/plain'
4142 ```
4243
43 ## Name Conflict
44
45 There are, sadly, two libraries which use the module name `magic`. Both have been around for quite a while.If you are using this module and get an error using a method like `open`, your code is expecting the other one. Hopefully one day these will be reconciled.
46
4744 ## Installation
4845
49 The current stable version of python-magic is available on pypi and
46 The current stable version of python-magic is available on PyPI and
5047 can be installed by running `pip install python-magic`.
5148
5249 Other sources:
5350
54 - pypi: http://pypi.python.org/pypi/python-magic/
55 - github: https://github.com/ahupp/python-magic
51 - PyPI: http://pypi.python.org/pypi/python-magic/
52 - GitHub: https://github.com/ahupp/python-magic
5653
57 ### Dependencies
54 This module is a simple wrapper around the libmagic C library, and
55 that must be installed as well:
5856
59 On Windows, copy magic1.dll, regex2.dll, and zlib1.dll onto your PATH from the Binaries and Dependencies zipfiles provided by the [File for Windows](http://gnuwin32.sourceforge.net/packages/file.htm) project. You will need to copy the file `magic` out of `[binary-zip]\share\misc`, and pass it's location to `Magic(magic_file=...)`. If you are using a 64-bit build of python, you'll need 64-bit libmagic binaries which can be found here: https://github.com/pidydx/libmagicwin64 (note: untested)
57 ### Debian/Ubuntu
6058
61 On OSX:
59 $ sudo apt-get install libmagic1
60
61 ### Windows
62
63 You'll need DLLs for libmagic. @julian-r has uploaded a version of this project that includes binaries to PyPI:
64 https://pypi.python.org/pypi/python-magic-bin/0.4.14
65
66 Other sources of the libraries in the past have been [File for Windows](http://gnuwin32.sourceforge.net/packages/file.htm) . You will need to copy the file `magic` out of `[binary-zip]\share\misc`, and pass its location to `Magic(magic_file=...)`.
67
68 If you are using a 64-bit build of python, you'll need 64-bit libmagic binaries which can be found here: https://github.com/pidydx/libmagicwin64. Newer version can be found here: https://github.com/nscaife/file-windows.
69
70
71 ### OSX
6272
6373 - When using Homebrew: `brew install libmagic`
6474 - When using macports: `port install file`
7787 - 'WindowsError: exception: access violation writing 0x00000000 ' This may indicate you are mixing
7888 Windows Python and Cygwin Python. Make sure your libmagic and python builds are consistent.
7989
90
91 ## Bug Reports
92
93 python-magic is a thin layer over the libmagic C library.
94 Historically, most bugs that have been reported against python-magic
95 are actually bugs in libmagic; libmagic bugs can be reported on their
96 tracker here: https://bugs.astron.com/my_view_page.php. If you're not
97 sure where the bug lies feel free to file an issue on GitHub and I can
98 triage it.
99
100 ## Running the tests
101
102 To run the tests across 3 recent Ubuntu LTS releases (depends on Docker):
103
104 $ ./test_docker.sh
105
106 To run tests locally across all available python versions:
107
108 $ ./test/run.py
109
110 To run against a specific python version:
111
112 $ LC_ALL=en_US.UTF-8 python3 test/test.py
113
114 ## Versioning
115
116 Minor version bumps should be backwards compatible. Major bumps are not.
117
118 ## Name Conflict
119
120 There are, sadly, two libraries which use the module name `magic`.
121 Both have been around for quite a while. If you are using this module
122 and get an error using a method like `open`, your code is expecting
123 the other one. Hopefully one day these will be reconciled.
124
125
80126 ## Author
81127
82128 Written by Adam Hupp in 2001 for a project that never got off the
84130 switched to ctypes once that was part of the python standard library.
85131
86132 You can contact me via my [website](http://hupp.org/adam) or
87 [github](http://github.com/ahupp).
133 [GitHub](http://github.com/ahupp).
88134
89135 ## Contributors
90136
91 Thanks to these folks on github who submitted features and bugfixes.
137 Thanks to these folks on github who submitted features and bug fixes.
92138
93139 - Amit Sethi
94140 - [bigben87](https://github.com/bigben87)
95141 - [fallgesetz](https://github.com/fallgesetz)
96142 - [FlaPer87](https://github.com/FlaPer87)
143 - [Hugo van Kemenade](https://github.com/hugovk)
97144 - [lukenowak](https://github.com/lukenowak)
98145 - NicolasDelaby
99146 - [email protected]
105152 python-magic is distributed under the MIT license. See the included
106153 LICENSE file for details.
107154
155 I am providing code in the repository to you under an open source license. Because this is my personal repository, the license you receive to my code is from me and not my employer (Facebook).
108156
1313 'PDF document, version 1.2'
1414 >>>
1515
16
1716 """
1817
1918 import sys
2019 import glob
21 import os.path
2220 import ctypes
2321 import ctypes.util
2422 import threading
2523
26 from ctypes import c_char_p, c_int, c_size_t, c_void_p
24 from ctypes import c_char_p, c_int, c_size_t, c_void_p, byref, POINTER
2725
2826
2927 class MagicException(Exception):
3533 class Magic:
3634 """
3735 Magic is a wrapper around the libmagic C library.
38
3936 """
4037
4138 def __init__(self, mime=False, magic_file=None, mime_encoding=False,
42 keep_going=False, uncompress=False):
39 keep_going=False, uncompress=False, raw=False):
4340 """
4441 Create a new libmagic wrapper.
4542
4845 magic_file - use a mime database other than the system default
4946 keep_going - don't stop at the first match, keep going
5047 uncompress - Try to look inside compressed files.
48 raw - Do not try to decode "non-printable" chars.
5149 """
5250 self.flags = MAGIC_NONE
5351 if mime:
54 self.flags |= MAGIC_MIME
52 self.flags |= MAGIC_MIME_TYPE
5553 if mime_encoding:
5654 self.flags |= MAGIC_MIME_ENCODING
5755 if keep_going:
5856 self.flags |= MAGIC_CONTINUE
59
6057 if uncompress:
6158 self.flags |= MAGIC_COMPRESS
59 if raw:
60 self.flags |= MAGIC_RAW
6261
6362 self.cookie = magic_open(self.flags)
6463 self.lock = threading.Lock()
65
64
6665 magic_load(self.cookie, magic_file)
66
67
68 # For https://github.com/ahupp/python-magic/issues/190
69 # libmagic has fixed internal limits that some files exceed, causing
70 # an error. We can avoid this (at least for the sample file given)
71 # by bumping the limit up. It's not clear if this is a general solution
72 # or whether other internal limits should be increased, but given
73 # the lack of other reports I'll assume this is rare.
74 if _has_param:
75 self.setparam(MAGIC_PARAM_NAME_MAX, 64)
6776
6877 def from_buffer(self, buf):
6978 """
7180 """
7281 with self.lock:
7382 try:
83 # if we're on python3, convert buf to bytes
84 # otherwise this string is passed as wchar*
85 # which is not what libmagic expects
86 if type(buf) == str and str != bytes:
87 buf = buf.encode('utf-8', errors='replace')
7488 return maybe_decode(magic_buffer(self.cookie, buf))
7589 except MagicException as e:
7690 return self._handle509Bug(e)
89103 # libmagic 5.09 has a bug where it might fail to identify the
90104 # mimetype of a file and returns null from magic_file (and
91105 # likely _buffer), but also does not return an error message.
92 if e.message is None and (self.flags & MAGIC_MIME):
106 if e.message is None and (self.flags & MAGIC_MIME_TYPE):
93107 return "application/octet-stream"
94108 else:
95109 raise e
96
110
111 def setparam(self, param, val):
112 return magic_setparam(self.cookie, param, val)
113
114 def getparam(self, param):
115 return magic_getparam(self.cookie, param)
116
97117 def __del__(self):
98118 # no _thread_check here because there can be no other
99119 # references to this object at this point.
111131
112132 _instances = {}
113133
134
114135 def _get_magic_type(mime):
115136 i = _instances.get(mime)
116137 if i is None:
117138 i = _instances[mime] = Magic(mime=mime)
118139 return i
119140
141
120142 def from_file(filename, mime=False):
121143 """"
122144 Accepts a filename and returns the detected filetype. Return
129151 m = _get_magic_type(mime)
130152 return m.from_file(filename)
131153
154
132155 def from_buffer(buffer, mime=False):
133156 """
134157 Accepts a binary string and returns the detected filetype. Return
142165 return m.from_buffer(buffer)
143166
144167
145
146
147168 libmagic = None
148169 # Let's try to find magic or magic1
149 dll = ctypes.util.find_library('magic') or ctypes.util.find_library('magic1') or ctypes.util.find_library('cygmagic-1')
150
151 # This is necessary because find_library returns None if it doesn't find the library
170 dll = ctypes.util.find_library('magic') \
171 or ctypes.util.find_library('magic1') \
172 or ctypes.util.find_library('cygmagic-1') \
173 or ctypes.util.find_library('libmagic-1') \
174 or ctypes.util.find_library('msys-magic-1') #for MSYS2
175
176 # necessary because find_library returns None if it doesn't find the library
152177 if dll:
153178 libmagic = ctypes.CDLL(dll)
154179
155180 if not libmagic or not libmagic._name:
156 windows_dlls = ['magic1.dll','cygmagic-1.dll']
181 windows_dlls = ['magic1.dll', 'cygmagic-1.dll', 'libmagic-1.dll', 'msys-magic-1.dll']
157182 platform_to_lib = {'darwin': ['/opt/local/lib/libmagic.dylib',
158183 '/usr/local/lib/libmagic.dylib'] +
159 # Assumes there will only be one version installed
160 glob.glob('/usr/local/Cellar/libmagic/*/lib/libmagic.dylib'),
184 # Assumes there will only be one version installed
185 glob.glob('/usr/local/Cellar/libmagic/*/lib/libmagic.dylib'), # flake8:noqa
161186 'win32': windows_dlls,
162187 'cygwin': windows_dlls,
163 'linux': ['libmagic.so.1'], # fallback for some Linuxes (e.g. Alpine) where library search does not work
188 'linux': ['libmagic.so.1'], # fallback for some Linuxes (e.g. Alpine) where library search does not work # flake8:noqa
164189 }
165190 platform = 'linux' if sys.platform.startswith('linux') else sys.platform
166191 for dll in platform_to_lib.get(platform, []):
176201
177202 magic_t = ctypes.c_void_p
178203
204
179205 def errorcheck_null(result, func, args):
180206 if result is None:
181207 err = magic_error(args[0])
183209 else:
184210 return result
185211
212
186213 def errorcheck_negative_one(result, func, args):
187 if result is -1:
214 if result == -1:
188215 err = magic_error(args[0])
189216 raise MagicException(err)
190217 else:
198225 return s
199226 else:
200227 return s.decode('utf-8')
201
228
229
202230 def coerce_filename(filename):
203231 if filename is None:
204232 return None
205233
206234 # ctypes will implicitly convert unicode strings to bytes with
207 # .encode('ascii'). If you use the filesystem encoding
235 # .encode('ascii'). If you use the filesystem encoding
208236 # then you'll get inconsistent behavior (crashes) depending on the user's
209237 # LANG environment variable
210238 is_unicode = (sys.version_info[0] <= 2 and
212240 (sys.version_info[0] >= 3 and
213241 isinstance(filename, str))
214242 if is_unicode:
215 return filename.encode('utf-8')
243 return filename.encode('utf-8', 'surrogateescape')
216244 else:
217245 return filename
246
218247
219248 magic_open = libmagic.magic_open
220249 magic_open.restype = magic_t
237266 _magic_file.argtypes = [magic_t, c_char_p]
238267 _magic_file.errcheck = errorcheck_null
239268
269
240270 def magic_file(cookie, filename):
241271 return _magic_file(cookie, coerce_filename(filename))
242272
245275 _magic_buffer.argtypes = [magic_t, c_void_p, c_size_t]
246276 _magic_buffer.errcheck = errorcheck_null
247277
278
248279 def magic_buffer(cookie, buf):
249280 return _magic_buffer(cookie, buf, len(buf))
250281
254285 _magic_load.argtypes = [magic_t, c_char_p]
255286 _magic_load.errcheck = errorcheck_negative_one
256287
288
257289 def magic_load(cookie, filename):
258290 return _magic_load(cookie, coerce_filename(filename))
259291
269301 magic_compile.restype = c_int
270302 magic_compile.argtypes = [magic_t, c_char_p]
271303
272
304 _has_param = False
305 if hasattr(libmagic, 'magic_setparam') and hasattr(libmagic, 'magic_getparam'):
306 _has_param = True
307 _magic_setparam = libmagic.magic_setparam
308 _magic_setparam.restype = c_int
309 _magic_setparam.argtypes = [magic_t, c_int, POINTER(c_size_t)]
310 _magic_setparam.errcheck = errorcheck_negative_one
311
312 _magic_getparam = libmagic.magic_getparam
313 _magic_getparam.restype = c_int
314 _magic_getparam.argtypes = [magic_t, c_int, POINTER(c_size_t)]
315 _magic_getparam.errcheck = errorcheck_negative_one
316
317 def magic_setparam(cookie, param, val):
318 if not _has_param:
319 raise NotImplementedError("magic_setparam not implemented")
320 v = c_size_t(val)
321 return _magic_setparam(cookie, param, byref(v))
322
323 def magic_getparam(cookie, param):
324 if not _has_param:
325 raise NotImplementedError("magic_getparam not implemented")
326 val = c_size_t()
327 _magic_getparam(cookie, param, byref(val))
328 return val.value
329
330 _has_version = False
331 if hasattr(libmagic, "magic_version"):
332 _has_version = True
333 magic_version = libmagic.magic_version
334 magic_version.restype = c_int
335 magic_version.argtypes = []
336
337 def version():
338 if not _has_version:
339 raise NotImplementedError("magic_version not implemented")
340 return magic_version()
273341
274342 MAGIC_NONE = 0x000000 # No flags
275343 MAGIC_DEBUG = 0x000001 # Turn on debugging
276344 MAGIC_SYMLINK = 0x000002 # Follow symlinks
277345 MAGIC_COMPRESS = 0x000004 # Check inside compressed files
278346 MAGIC_DEVICES = 0x000008 # Look at the contents of devices
347 MAGIC_MIME_TYPE = 0x000010 # Return a mime string
348 MAGIC_MIME_ENCODING = 0x000400 # Return the MIME encoding
349 # TODO: should be
350 # MAGIC_MIME = MAGIC_MIME_TYPE | MAGIC_MIME_ENCODING
279351 MAGIC_MIME = 0x000010 # Return a mime string
280 MAGIC_MIME_ENCODING = 0x000400 # Return the MIME encoding
352
281353 MAGIC_CONTINUE = 0x000020 # Return all matches
282354 MAGIC_CHECK = 0x000040 # Print warnings to stderr
283355 MAGIC_PRESERVE_ATIME = 0x000080 # Restore access time on exit
293365 MAGIC_NO_CHECK_TROFF = 0x040000 # Don't check ascii/troff
294366 MAGIC_NO_CHECK_FORTRAN = 0x080000 # Don't check ascii/fortran
295367 MAGIC_NO_CHECK_TOKENS = 0x100000 # Don't check ascii/tokens
368
369 MAGIC_PARAM_INDIR_MAX = 0 # Recursion limit for indirect magic
370 MAGIC_PARAM_NAME_MAX = 1 # Use count limit for name/use magic
371 MAGIC_PARAM_ELF_PHNUM_MAX = 2 # Max ELF notes processed
372 MAGIC_PARAM_ELF_SHNUM_MAX = 3 # Max ELF program sections processed
373 MAGIC_PARAM_ELF_NOTES_MAX = 4 # # Max ELF sections processed
374 MAGIC_PARAM_REGEX_MAX = 5 # Length limit for regex searches
375 MAGIC_PARAM_BYTES_MAX = 6 # Max number of bytes to read from file
0 Metadata-Version: 2.1
1 Name: python-magic
2 Version: 0.4.18
3 Summary: File type identification using libmagic
4 Home-page: http://github.com/ahupp/python-magic
5 Author: Adam Hupp
6 Author-email: [email protected]
7 License: MIT
8 Description: # python-magic
9 [![PyPI version](https://badge.fury.io/py/python-magic.svg)](https://badge.fury.io/py/python-magic)
10 [![Build Status](https://travis-ci.org/ahupp/python-magic.svg?branch=master)](https://travis-ci.org/ahupp/python-magic)
11
12 python-magic is a Python interface to the libmagic file type
13 identification library. libmagic identifies file types by checking
14 their headers according to a predefined list of file types. This
15 functionality is exposed to the command line by the Unix command
16 `file`.
17
18 ## Usage
19
20 ```python
21 >>> import magic
22 >>> magic.from_file("testdata/test.pdf")
23 'PDF document, version 1.2'
24 # recommend using at least the first 2048 bytes, as less can produce incorrect identification
25 >>> magic.from_buffer(open("testdata/test.pdf").read(2048))
26 'PDF document, version 1.2'
27 >>> magic.from_file("testdata/test.pdf", mime=True)
28 'application/pdf'
29 ```
30
31 There is also a `Magic` class that provides more direct control,
32 including overriding the magic database file and turning on character
33 encoding detection. This is not recommended for general use. In
34 particular, it's not safe for sharing across multiple threads and
35 will fail throw if this is attempted.
36
37 ```python
38 >>> f = magic.Magic(uncompress=True)
39 >>> f.from_file('testdata/test.gz')
40 'ASCII text (gzip compressed data, was "test", last modified: Sat Jun 28
41 21:32:52 2008, from Unix)'
42 ```
43
44 You can also combine the flag options:
45
46 ```python
47 >>> f = magic.Magic(mime=True, uncompress=True)
48 >>> f.from_file('testdata/test.gz')
49 'text/plain'
50 ```
51
52 ## Installation
53
54 The current stable version of python-magic is available on PyPI and
55 can be installed by running `pip install python-magic`.
56
57 Other sources:
58
59 - PyPI: http://pypi.python.org/pypi/python-magic/
60 - GitHub: https://github.com/ahupp/python-magic
61
62 This module is a simple wrapper around the libmagic C library, and
63 that must be installed as well:
64
65 ### Debian/Ubuntu
66
67 $ sudo apt-get install libmagic1
68
69 ### Windows
70
71 You'll need DLLs for libmagic. @julian-r has uploaded a version of this project that includes binaries to PyPI:
72 https://pypi.python.org/pypi/python-magic-bin/0.4.14
73
74 Other sources of the libraries in the past have been [File for Windows](http://gnuwin32.sourceforge.net/packages/file.htm) . You will need to copy the file `magic` out of `[binary-zip]\share\misc`, and pass its location to `Magic(magic_file=...)`.
75
76 If you are using a 64-bit build of python, you'll need 64-bit libmagic binaries which can be found here: https://github.com/pidydx/libmagicwin64. Newer version can be found here: https://github.com/nscaife/file-windows.
77
78
79 ### OSX
80
81 - When using Homebrew: `brew install libmagic`
82 - When using macports: `port install file`
83
84 ### Troubleshooting
85
86 - 'MagicException: could not find any magic files!': some
87 installations of libmagic do not correctly point to their magic
88 database file. Try specifying the path to the file explicitly in the
89 constructor: `magic.Magic(magic_file="path_to_magic_file")`.
90
91 - 'WindowsError: [Error 193] %1 is not a valid Win32 application':
92 Attempting to run the 32-bit libmagic DLL in a 64-bit build of
93 python will fail with this error. Here are 64-bit builds of libmagic for windows: https://github.com/pidydx/libmagicwin64
94
95 - 'WindowsError: exception: access violation writing 0x00000000 ' This may indicate you are mixing
96 Windows Python and Cygwin Python. Make sure your libmagic and python builds are consistent.
97
98
99 ## Bug Reports
100
101 python-magic is a thin layer over the libmagic C library.
102 Historically, most bugs that have been reported against python-magic
103 are actually bugs in libmagic; libmagic bugs can be reported on their
104 tracker here: https://bugs.astron.com/my_view_page.php. If you're not
105 sure where the bug lies feel free to file an issue on GitHub and I can
106 triage it.
107
108 ## Running the tests
109
110 To run the tests across 3 recent Ubuntu LTS releases (depends on Docker):
111
112 $ ./test_docker.sh
113
114 To run tests locally across all available python versions:
115
116 $ ./test/run.py
117
118 To run against a specific python version:
119
120 $ LC_ALL=en_US.UTF-8 python3 test/test.py
121
122 ## Versioning
123
124 Minor version bumps should be backwards compatible. Major bumps are not.
125
126 ## Name Conflict
127
128 There are, sadly, two libraries which use the module name `magic`.
129 Both have been around for quite a while. If you are using this module
130 and get an error using a method like `open`, your code is expecting
131 the other one. Hopefully one day these will be reconciled.
132
133
134 ## Author
135
136 Written by Adam Hupp in 2001 for a project that never got off the
137 ground. It originally used SWIG for the C library bindings, but
138 switched to ctypes once that was part of the python standard library.
139
140 You can contact me via my [website](http://hupp.org/adam) or
141 [GitHub](http://github.com/ahupp).
142
143 ## Contributors
144
145 Thanks to these folks on github who submitted features and bug fixes.
146
147 - Amit Sethi
148 - [bigben87](https://github.com/bigben87)
149 - [fallgesetz](https://github.com/fallgesetz)
150 - [FlaPer87](https://github.com/FlaPer87)
151 - [Hugo van Kemenade](https://github.com/hugovk)
152 - [lukenowak](https://github.com/lukenowak)
153 - NicolasDelaby
154 - [email protected]
155 - SimpleSeb
156 - [tehmaze](https://github.com/tehmaze)
157
158 ## License
159
160 python-magic is distributed under the MIT license. See the included
161 LICENSE file for details.
162
163 I am providing code in the repository to you under an open source license. Because this is my personal repository, the license you receive to my code is from me and not my employer (Facebook).
164
165
166 Keywords: mime magic file
167 Platform: UNKNOWN
168 Classifier: Intended Audience :: Developers
169 Classifier: License :: OSI Approved :: MIT License
170 Classifier: Programming Language :: Python
171 Classifier: Programming Language :: Python :: 2
172 Classifier: Programming Language :: Python :: 2.7
173 Classifier: Programming Language :: Python :: 3
174 Classifier: Programming Language :: Python :: 3.5
175 Classifier: Programming Language :: Python :: 3.6
176 Classifier: Programming Language :: Python :: 3.7
177 Classifier: Programming Language :: Python :: 3.8
178 Classifier: Programming Language :: Python :: Implementation :: CPython
179 Requires-Python: >=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*
180 Description-Content-Type: text/markdown
0 LICENSE
1 MANIFEST.in
2 README.md
3 __init__.py
4 magic.py
5 setup.cfg
6 setup.py
7 python_magic.egg-info/PKG-INFO
8 python_magic.egg-info/SOURCES.txt
9 python_magic.egg-info/dependency_links.txt
10 python_magic.egg-info/top_level.txt
11 test/test.py
00 [global]
1 command_packages=stdeb.command
1 command_packages = stdeb.command
22
33 [bdist_wheel]
44 universal = 1
5
6 [egg_info]
7 tag_build =
8 tag_date = 0
9
11 # -*- coding: utf-8 -*-
22
33 from setuptools import setup
4 import io
5 import os
6
7
8 def read(file_name):
9 """Read a text file and return the content as a string."""
10 with io.open(os.path.join(os.path.dirname(__file__), file_name),
11 encoding='utf-8') as f:
12 return f.read()
413
514 setup(name='python-magic',
615 description='File type identification using libmagic',
716 author='Adam Hupp',
817 author_email='[email protected]',
918 url="http://github.com/ahupp/python-magic",
10 version='0.4.13',
19 version='0.4.18',
1120 py_modules=['magic'],
12 long_description="""This module uses ctypes to access the libmagic file type
13 identification library. It makes use of the local magic database and
14 supports both textual and MIME-type output.
15 """,
21 long_description=read('README.md'),
22 long_description_content_type='text/markdown',
1623 keywords="mime magic file",
1724 license="MIT",
18 test_suite='test',
25 python_requires='>=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*',
1926 classifiers=[
2027 'Intended Audience :: Developers',
2128 'License :: OSI Approved :: MIT License',
2229 'Programming Language :: Python',
2330 'Programming Language :: Python :: 2',
31 'Programming Language :: Python :: 2.7',
2432 'Programming Language :: Python :: 3',
33 'Programming Language :: Python :: 3.5',
34 'Programming Language :: Python :: 3.6',
35 'Programming Language :: Python :: 3.7',
36 'Programming Language :: Python :: 3.8',
37 'Programming Language :: Python :: Implementation :: CPython',
2538 ],
2639 )
+0
-3
stdeb.cfg less more
0 [python-magic]
1 Depends: libmagic1
2 Conflicts: python-magic
+0
-0
test/__init__.py less more
(Empty file)
+0
-12
test/run.sh less more
0 #!/bin/sh
1
2 set -e
3
4 # ensure we can use unicode filenames in the test
5 export LC_ALL=en_US.UTF-8
6 THISDIR=`dirname $0`
7 export PYTHONPATH=${THISDIR}/..
8
9 python2.6 ${THISDIR}/test.py
10 python2.7 ${THISDIR}/test.py
11 python3 ${THISDIR}/test.py
0 import os, sys
0 import os
11 # for output which reports a local time
22 os.environ['TZ'] = 'GMT'
3
4 if os.environ.get('LC_ALL','') != 'en_US.UTF-8':
5 # this ensure we're in a utf-8 default filesystem encoding which is
6 # necessary for some tests
7 raise Exception("must run `export LC_ALL=en_US.UTF-8` before running test suite")
8
39 import shutil
410 import os.path
511 import unittest
612
713 import magic
14 import sys
815
916 class MagicTest(unittest.TestCase):
10 TESTDATA_DIR = os.path.join(os.path.dirname(__file__), 'testdata')
17 TESTDATA_DIR = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'testdata')
1118
12 def assert_values(self, m, expected_values):
19 def test_version(self):
20 try:
21 self.assertTrue(magic.version() > 0)
22 except NotImplementedError:
23 pass
24
25 def test_fs_encoding(self):
26 self.assertEqual('utf-8', sys.getfilesystemencoding().lower())
27
28 def assert_values(self, m, expected_values, buf_equals_file=True):
1329 for filename, expected_value in expected_values.items():
1430 try:
1531 filename = os.path.join(self.TESTDATA_DIR, filename)
1632 except TypeError:
17 filename = os.path.join(self.TESTDATA_DIR.encode('utf-8'), filename)
33 filename = os.path.join(
34 self.TESTDATA_DIR.encode('utf-8'), filename)
1835
19
2036 if type(expected_value) is not tuple:
2137 expected_value = (expected_value,)
2238
23 for i in expected_value:
24 with open(filename, 'rb') as f:
25 buf_value = m.from_buffer(f.read())
39 with open(filename, 'rb') as f:
40 buf_value = m.from_buffer(f.read())
2641
27 file_value = m.from_file(filename)
28 if buf_value == i and file_value == i:
29 break
30 else:
31 self.assertTrue(False, "no match for " + repr(expected_value))
32
42 file_value = m.from_file(filename)
43
44 if buf_equals_file:
45 self.assertEqual(buf_value, file_value)
46
47 for value in (buf_value, file_value):
48 self.assertIn(value, expected_value)
49
50 def test_from_file_str_and_bytes(self):
51 filename = os.path.join(self.TESTDATA_DIR, "test.pdf")
52
53 self.assertEqual('application/pdf',
54 magic.from_file(filename, mime=True))
55 self.assertEqual('application/pdf',
56 magic.from_file(filename.encode('utf-8'), mime=True))
57
58 def test_from_buffer_str_and_bytes(self):
59 m = magic.Magic(mime=True)
60 s = '#!/usr/bin/env python\nprint("foo")'
61 self.assertEqual("text/x-python", m.from_buffer(s))
62 b = b'#!/usr/bin/env python\nprint("foo")'
63 self.assertEqual("text/x-python", m.from_buffer(b))
64
3365 def test_mime_types(self):
34 dest = os.path.join(MagicTest.TESTDATA_DIR, b'\xce\xbb'.decode('utf-8'))
66 dest = os.path.join(MagicTest.TESTDATA_DIR,
67 b'\xce\xbb'.decode('utf-8'))
3568 shutil.copyfile(os.path.join(MagicTest.TESTDATA_DIR, 'lambda'), dest)
3669 try:
3770 m = magic.Magic(mime=True)
3871 self.assert_values(m, {
39 'magic.pyc': 'application/octet-stream',
72 'magic._pyc_': 'application/octet-stream',
4073 'test.pdf': 'application/pdf',
41 'test.gz': 'application/gzip',
74 'test.gz': ('application/gzip', 'application/x-gzip'),
75 'test.snappy.parquet': 'application/octet-stream',
4276 'text.txt': 'text/plain',
4377 b'\xce\xbb'.decode('utf-8'): 'text/plain',
4478 b'\xce\xbb': 'text/plain',
4882
4983 def test_descriptions(self):
5084 m = magic.Magic()
51 os.environ['TZ'] = 'UTC' # To get the last modified date of test.gz in UTC
85 os.environ['TZ'] = 'UTC' # To get last modified date of test.gz in UTC
5286 try:
5387 self.assert_values(m, {
54 'magic.pyc': 'python 2.4 byte-compiled',
88 'magic._pyc_': 'python 2.4 byte-compiled',
5589 'test.pdf': 'PDF document, version 1.2',
5690 'test.gz':
57 ('gzip compressed data, was "test", from Unix, last modified: Sun Jun 29 01:32:52 2008',
58 'gzip compressed data, was "test", last modified: Sun Jun 29 01:32:52 2008, from Unix'),
91 ('gzip compressed data, was "test", from Unix, last '
92 'modified: Sun Jun 29 01:32:52 2008',
93 'gzip compressed data, was "test", last modified'
94 ': Sun Jun 29 01:32:52 2008, from Unix',
95 'gzip compressed data, was "test", last modified'
96 ': Sun Jun 29 01:32:52 2008, from Unix, original size 15',
97 'gzip compressed data, was "test", '
98 'last modified: Sun Jun 29 01:32:52 2008, '
99 'from Unix, original size modulo 2^32 15'
100 ),
59101 'text.txt': 'ASCII text',
60 })
102 'test.snappy.parquet': ('Apache Parquet', 'Par archive data'),
103 }, buf_equals_file=False)
61104 finally:
62105 del os.environ['TZ']
106
107 def test_unicode_result_nonraw(self):
108 m = magic.Magic(raw=False)
109 src = os.path.join(MagicTest.TESTDATA_DIR, 'pgpunicode')
110 result = m.from_file(src)
111 # NOTE: This check is added as otherwise some magic files don't identify the test case as a PGP key.
112 if 'PGP' in result:
113 assert r"PGP\011Secret Sub-key -" == result
114 else:
115 raise unittest.SkipTest("Magic file doesn't return expected type.")
116
117 def test_unicode_result_raw(self):
118 m = magic.Magic(raw=True)
119 src = os.path.join(MagicTest.TESTDATA_DIR, 'pgpunicode')
120 result = m.from_file(src)
121 if 'PGP' in result:
122 assert b'PGP\tSecret Sub-key -' == result.encode('utf-8')
123 else:
124 raise unittest.SkipTest("Magic file doesn't return expected type.")
63125
64126 def test_mime_encodings(self):
65127 m = magic.Magic(mime_encoding=True)
84146
85147 m = magic.Magic(mime=True)
86148 self.assertEqual(m.from_file(filename), 'image/jpeg')
87
88 m = magic.Magic(mime=True, keep_going=True)
89 self.assertEqual(m.from_file(filename), 'image/jpeg')
90149
150 try:
151 # this will throw if you have an "old" version of the library
152 # I'm otherwise not sure how to query if keep_going is supported
153 magic.version()
154 m = magic.Magic(mime=True, keep_going=True)
155 self.assertEqual(m.from_file(filename),
156 'image/jpeg\\012- application/octet-stream')
157 except NotImplementedError:
158 pass
91159
92160 def test_rethrow(self):
93161 old = magic.magic_buffer
94162 try:
95 def t(x,y):
163 def t(x, y):
96164 raise magic.MagicException("passthrough")
97165 magic.magic_buffer = t
98
99 self.assertRaises(magic.MagicException, magic.from_buffer, "hello", True)
166
167 with self.assertRaises(magic.MagicException):
168 magic.from_buffer("hello", True)
100169 finally:
101170 magic.magic_buffer = old
171
172 def test_getparam(self):
173 m = magic.Magic(mime=True)
174 try:
175 m.setparam(magic.MAGIC_PARAM_INDIR_MAX, 1)
176 self.assertEqual(m.getparam(magic.MAGIC_PARAM_INDIR_MAX), 1)
177 except NotImplementedError:
178 pass
179
180 def test_name_count(self):
181 m = magic.Magic()
182 with open(os.path.join(self.TESTDATA_DIR, 'name_use.jpg'), 'rb') as f:
183 m.from_buffer(f.read())
184
102185 if __name__ == '__main__':
103186 unittest.main()
test/testdata/keep-going.jpg less more
Binary diff not shown
+0
-1
test/testdata/lambda less more
0 test
test/testdata/magic.pyc less more
Binary diff not shown
test/testdata/test.gz less more
Binary diff not shown
+0
-199
test/testdata/test.pdf less more
0 %PDF-1.2
1 7 0 obj
2 [5 0 R/XYZ 111.6 757.86]
3 endobj
4 13 0 obj
5 <<
6 /Title(About this document)
7 /A<<
8 /S/GoTo
9 /D(subsection.1.1)
10 >>
11 /Parent 12 0 R
12 /Next 14 0 R
13 >>
14 endobj
15 15 0 obj
16 <<
17 /Title(Compiling with GHC)
18 /A<<
19 /S/GoTo
20 /D(subsubsection.1.2.1)
21 >>
22 /Parent 14 0 R
23 /Next 16 0 R
24 >>
25 endobj
26 16 0 obj
27 <<
28 /Title(Compiling with Hugs)
29 /A<<
30 /S/GoTo
31 /D(subsubsection.1.2.2)
32 >>
33 /Parent 14 0 R
34 /Prev 15 0 R
35 >>
36 endobj
37 14 0 obj
38 <<
39 /Title(Compatibility)
40 /A<<
41 /S/GoTo
42 /D(subsection.1.2)
43 >>
44 /Parent 12 0 R
45 /Prev 13 0 R
46 /First 15 0 R
47 /Last 16 0 R
48 /Count -2
49 /Next 17 0 R
50 >>
51 endobj
52 17 0 obj
53 <<
54 /Title(Reporting bugs)
55 /A<<
56 /S/GoTo
57 /D(subsection.1.3)
58 >>
59 /Parent 12 0 R
60 /Prev 14 0 R
61 /Next 18 0 R
62 >>
63 endobj
64 18 0 obj
65 <<
66 /Title(History)
67 /A<<
68 /S/GoTo
69 /D(subsection.1.4)
70 >>
71 /Parent 12 0 R
72 /Prev 17 0 R
73 /Next 19 0 R
74 >>
75 endobj
76 19 0 obj
77 <<
78 /Title(License)
79 /A<<
80 /S/GoTo
81 /D(subsection.1.5)
82 >>
83 /Parent 12 0 R
84 /Prev 18 0 R
85 >>
86 endobj
87 12 0 obj
88 <<
89 /Title(Introduction)
90 /A<<
91 /S/GoTo
92 /D(section.1)
93 >>
94 /Parent 11 0 R
95 /First 13 0 R
96 /Last 19 0 R
97 /Count -5
98 /Next 20 0 R
99 >>
100 endobj
101 21 0 obj
102 <<
103 /Title(Running a parser)
104 /A<<
105 /S/GoTo
106 /D(subsection.2.1)
107 >>
108 /Parent 20 0 R
109 /Next 22 0 R
110 >>
111 endobj
112 22 0 obj
113 <<
114 /Title(Sequence and choice)
115 /A<<
116 /S/GoTo
117 /D(subsection.2.2)
118 >>
119 /Parent 20 0 R
120 /Prev 21 0 R
121 /Next 23 0 R
122 >>
123 endobj
124 23 0 obj
125 <<
126 /Title(Predictive parsers)
127 /A<<
128 /S/GoTo
129 /D(subsection.2.3)
130 >>
131 /Parent 20 0 R
132 /Prev 22 0 R
133 /Next 24 0 R
134 >>
135 endobj
136 24 0 obj
137 <<
138 /Title(Adding semantics)
139 /A<<
140 /S/GoTo
141 /D(subsection.2.4)
142 >>
143 /Parent 20 0 R
144 /Prev 23 0 R
145 /Next 25 0 R
146 >>
147 endobj
148 25 0 obj
149 <<
150 /Title(Sequences and seperators)
151 /A<<
152 /S/GoTo
153 /D(subsection.2.5)
154 >>
155 /Parent 20 0 R
156 /Prev 24 0 R
157 /Next 26 0 R
158 >>
159 endobj
160 26 0 obj
161 <<
162 /Title(Improving error messages)
163 /A<<
164 /S/GoTo
165 /D(subsection.2.6)
166 >>
167 /Parent 20 0 R
168 /Prev 25 0 R
169 /Next 27 0 R
170 >>
171 endobj
172 27 0 obj
173 <<
174 /Title(Expressions)
175 /A<<
176 /S/GoTo
177 /D(subsection.2.7)
178 >>
179 /Parent 20 0 R
180 /Prev 26 0 R
181 /Next 28 0 R
182 >>
183 endobj
184 28 0 obj
185 <<
186 /Title(Lexical analysis)
187 /A<<
188 /S/GoTo
189 /D(subsection.2.8)
190 >>
191 /Parent 20 0 R
192 /Prev 27 0 R
193 /Next 29 0 R
194 >>
195 endobj
196 30 0 obj
197 <<
198 /Title(Lexeme parsers
+0
-2
test/testdata/text-iso8859-1.txt less more
0 This is a web page encoded in iso-8859-1
1 יטאשפגןמ
+0
-2
test/testdata/text.txt less more
0 Hello, World!
1