New upstream snapshot.
Kali Janitor
1 year, 5 months ago
0 | ======= | |
1 | Credits | |
2 | ======= | |
3 | ||
4 | Developer | |
5 | --------- | |
6 | ||
7 | * https://github.com/anthonyhseb | |
8 | ||
9 | Contributors | |
10 | ------------ | |
11 | ||
12 | * https://github.com/rakeshsagalagatte | |
13 | * https://github.com/hildogjr |
0 | .. highlight:: shell | |
1 | ||
2 | ============ | |
3 | Contributing | |
4 | ============ | |
5 | ||
6 | Contributions are welcome, and they are greatly appreciated! Every | |
7 | little bit helps, and credit will always be given. | |
8 | ||
9 | You can contribute in many ways: | |
10 | ||
11 | Types of Contributions | |
12 | ---------------------- | |
13 | ||
14 | Report Bugs | |
15 | ~~~~~~~~~~~ | |
16 | ||
17 | Report bugs at https://github.com/anthonyhseb/googlesearch/issues. | |
18 | ||
19 | If you are reporting a bug, please include: | |
20 | ||
21 | * Your operating system name and version. | |
22 | * Any details about your local setup that might be helpful in troubleshooting. | |
23 | * Detailed steps to reproduce the bug. | |
24 | ||
25 | Fix Bugs | |
26 | ~~~~~~~~ | |
27 | ||
28 | Look through the GitHub issues for bugs. Anything tagged with "bug" | |
29 | and "help wanted" is open to whoever wants to implement it. | |
30 | ||
31 | Implement Features | |
32 | ~~~~~~~~~~~~~~~~~~ | |
33 | ||
34 | Look through the GitHub issues for features. Anything tagged with "enhancement" | |
35 | and "help wanted" is open to whoever wants to implement it. | |
36 | ||
37 | Write Documentation | |
38 | ~~~~~~~~~~~~~~~~~~~ | |
39 | ||
40 | google-search could always use more documentation, whether as part of the | |
41 | official google-search docs, in docstrings, or even on the web in blog posts, | |
42 | articles, and such. | |
43 | ||
44 | Submit Feedback | |
45 | ~~~~~~~~~~~~~~~ | |
46 | ||
47 | The best way to send feedback is to file an issue at https://github.com/anthonyhseb/googlesearch/issues. | |
48 | ||
49 | If you are proposing a feature: | |
50 | ||
51 | * Explain in detail how it would work. | |
52 | * Keep the scope as narrow as possible, to make it easier to implement. | |
53 | * Remember that this is a volunteer-driven project, and that contributions | |
54 | are welcome :) | |
55 | ||
56 | Get Started! | |
57 | ------------ | |
58 | ||
59 | Ready to contribute? Here's how to set up `googlesearch` for local development. | |
60 | ||
61 | 1. Fork the `googlesearch` repo on GitHub. | |
62 | 2. Clone your fork locally:: | |
63 | ||
64 | $ git clone [email protected]:your_name_here/googlesearch.git | |
65 | ||
66 | 3. Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up your fork for local development:: | |
67 | ||
68 | $ mkvirtualenv googlesearch | |
69 | $ cd googlesearch/ | |
70 | $ python setup.py develop | |
71 | ||
72 | 4. Create a branch for local development:: | |
73 | ||
74 | $ git checkout -b name-of-your-bugfix-or-feature | |
75 | ||
76 | Now you can make your changes locally. | |
77 | ||
78 | 5. When you're done making changes, check that your changes pass the tests, including testing other Python versions with tox:: | |
79 | ||
80 | $ python setup.py test or py.test | |
81 | $ tox | |
82 | ||
83 | 6. Commit your changes and push your branch to GitHub:: | |
84 | ||
85 | $ git add . | |
86 | $ git commit -m "Your detailed description of your changes." | |
87 | $ git push origin name-of-your-bugfix-or-feature | |
88 | ||
89 | 7. Submit a pull request through the GitHub website. | |
90 | ||
91 | Pull Request Guidelines | |
92 | ----------------------- | |
93 | ||
94 | Before you submit a pull request, check that it meets these guidelines: | |
95 | ||
96 | 1. The pull request should include tests. | |
97 | 2. If the pull request adds functionality, the docs should be updated. Put | |
98 | your new functionality into a function with a docstring, and add the | |
99 | feature to the list in README.rst. | |
100 | 3. The pull request should work for Python 2.6 and 2.7, and for PyPy. Check | |
101 | https://travis-ci.org/anthonyhseb/googlesearch/pull_requests | |
102 | and make sure that the tests pass for all supported Python versions. | |
103 | ||
104 | Tips | |
105 | ---- | |
106 | ||
107 | To run a subset of tests:: | |
108 | ||
109 | ||
110 | $ python -m unittest tests.test_googlesearch |
0 | ======= | |
1 | History | |
2 | ======= | |
3 | ||
4 | 1.0.0 (2017-05-06) | |
5 | ------------------ | |
6 | ||
7 | * First release on PyPI. |
0 | ||
1 | MIT License | |
2 | ||
3 | Copyright (c) 2017, Anthony Hseb | |
4 | ||
5 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: | |
6 | ||
7 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. | |
8 | ||
9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. | |
10 |
0 | include README.md | |
1 | include MANIFEST.in | |
2 | include setup.py | |
3 | include scripts/google | |
4 | include requirements.txt | |
5 | include googlesearch/user_agents.txt.gz | |
0 | ||
1 | include AUTHORS.rst | |
2 | ||
3 | include CONTRIBUTING.rst | |
4 | include HISTORY.rst | |
5 | include LICENSE | |
6 | include README.rst | |
7 | include googlesearch/browser_agents.txt | |
8 | ||
9 | recursive-include tests * | |
10 | recursive-exclude * __pycache__ | |
11 | recursive-exclude * *.py[co] | |
12 | ||
13 | recursive-include docs *.rst conf.py Makefile make.bat *.jpg *.png *.gif |
0 | Metadata-Version: 1.1 | |
1 | Name: google | |
2 | Version: 2.0.3 | |
3 | Summary: Python bindings to the Google search engine. | |
4 | Home-page: http://breakingcode.wordpress.com/ | |
5 | Author: Mario Vilas | |
6 | Author-email: [email protected] | |
7 | License: UNKNOWN | |
8 | Description: UNKNOWN | |
9 | Platform: UNKNOWN | |
10 | Classifier: Development Status :: 5 - Production/Stable | |
0 | Metadata-Version: 2.1 | |
1 | Name: google-search | |
2 | Version: 1.1.1 | |
3 | Summary: Library for scraping google search results | |
4 | Home-page: https://github.com/anthonyhseb/googlesearch | |
5 | Author: Anthony Hseb | |
6 | Author-email: [email protected] | |
7 | License: MIT license | |
8 | Keywords: googlesearch | |
9 | Classifier: Development Status :: 2 - Pre-Alpha | |
11 | 10 | Classifier: Intended Audience :: Developers |
12 | Classifier: License :: OSI Approved :: BSD License | |
13 | Classifier: Environment :: Console | |
14 | Classifier: Programming Language :: Python | |
15 | Classifier: Topic :: Software Development :: Libraries :: Python Modules | |
16 | Requires: beautifulsoup4 | |
17 | Provides: googlesearch | |
11 | Classifier: License :: OSI Approved :: MIT License | |
12 | Classifier: Natural Language :: English | |
13 | Classifier: Programming Language :: Python :: 2 | |
14 | Classifier: Programming Language :: Python :: 2.7 | |
15 | Classifier: Programming Language :: Python :: 3 | |
16 | Classifier: Programming Language :: Python :: 3.6 | |
17 | Classifier: Programming Language :: Python :: 3.8 | |
18 | License-File: LICENSE | |
19 | License-File: AUTHORS.rst | |
20 | ||
21 | ============= | |
22 | google-search | |
23 | ============= | |
24 | ||
25 | ||
26 | .. image:: https://img.shields.io/pypi/v/google-search.svg | |
27 | :target: https://pypi.python.org/pypi/google-search | |
28 | ||
29 | .. image:: https://img.shields.io/travis/anthonyhseb/googlesearch.svg | |
30 | :target: https://travis-ci.org/anthonyhseb/googlesearch | |
31 | ||
32 | .. image:: https://readthedocs.org/projects/googlesearch/badge/?version=latest | |
33 | :target: https://googlesearch.readthedocs.io/en/latest/?badge=latest | |
34 | :alt: Documentation Status | |
35 | ||
36 | .. image:: https://pyup.io/repos/github/anthonyhseb/googlesearch/shield.svg | |
37 | :target: https://pyup.io/repos/github/anthonyhseb/googlesearch/ | |
38 | :alt: Updates | |
39 | ||
40 | ||
41 | Library for scraping google search results. | |
42 | ||
43 | * Usage:: | |
44 | ||
45 | from googlesearch.googlesearch import GoogleSearch | |
46 | response = GoogleSearch().search("something") | |
47 | for result in response.results: | |
48 | print("Title: " + result.title) | |
49 | print("Content: " + result.getText()) | |
50 | ||
51 | ||
52 | ||
53 | * Free software: MIT license | |
54 | ||
55 | Features | |
56 | -------- | |
57 | ||
58 | Run a Google search and fetch the individual results (full HTML and text contents). By default the result URLs are fetched eagerly when the search request is made with 10 parallel requests. Fetching can be deferred until ``searchResult.getText()`` or ``getMarkup()`` are called by passing ``prefetch_results = False`` to the search method. | |
59 | ||
60 | Pass ``num_results`` to the search method to set the maximum number of results. | |
61 | ||
62 | ``SearchReponse.total`` gives the total number of results on Google. | |
63 | ||
64 | Credits | |
65 | --------- | |
66 | ||
67 | This package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template. | |
68 | ||
69 | .. _Cookiecutter: https://github.com/audreyr/cookiecutter | |
70 | .. _`audreyr/cookiecutter-pypackage`: https://github.com/audreyr/cookiecutter-pypackage | |
71 | ||
72 | ||
73 | ||
74 | ======= | |
75 | History | |
76 | ======= | |
77 | ||
78 | 1.0.0 (2017-05-06) | |
79 | ------------------ | |
80 | ||
81 | * First release on PyPI. |
0 | googlesearch | |
1 | ============ | |
2 | ||
3 | Google search from Python. | |
4 | ||
5 | https://python-googlesearch.readthedocs.io/en/latest/ | |
6 | ||
7 | Usage example | |
8 | ------------- | |
9 | ||
10 | # Get the first 20 hits for: "Breaking Code" WordPress blog | |
11 | from googlesearch import search | |
12 | for url in search('"Breaking Code" WordPress blog', stop=20): | |
13 | print(url) | |
14 | ||
15 | Installing | |
16 | ---------- | |
17 | ||
18 | pip install google |
0 | ============= | |
1 | google-search | |
2 | ============= | |
3 | ||
4 | ||
5 | .. image:: https://img.shields.io/pypi/v/google-search.svg | |
6 | :target: https://pypi.python.org/pypi/google-search | |
7 | ||
8 | .. image:: https://img.shields.io/travis/anthonyhseb/googlesearch.svg | |
9 | :target: https://travis-ci.org/anthonyhseb/googlesearch | |
10 | ||
11 | .. image:: https://readthedocs.org/projects/googlesearch/badge/?version=latest | |
12 | :target: https://googlesearch.readthedocs.io/en/latest/?badge=latest | |
13 | :alt: Documentation Status | |
14 | ||
15 | .. image:: https://pyup.io/repos/github/anthonyhseb/googlesearch/shield.svg | |
16 | :target: https://pyup.io/repos/github/anthonyhseb/googlesearch/ | |
17 | :alt: Updates | |
18 | ||
19 | ||
20 | Library for scraping google search results. | |
21 | ||
22 | * Usage:: | |
23 | ||
24 | from googlesearch.googlesearch import GoogleSearch | |
25 | response = GoogleSearch().search("something") | |
26 | for result in response.results: | |
27 | print("Title: " + result.title) | |
28 | print("Content: " + result.getText()) | |
29 | ||
30 | ||
31 | ||
32 | * Free software: MIT license | |
33 | ||
34 | Features | |
35 | -------- | |
36 | ||
37 | Run a Google search and fetch the individual results (full HTML and text contents). By default the result URLs are fetched eagerly when the search request is made with 10 parallel requests. Fetching can be deferred until ``searchResult.getText()`` or ``getMarkup()`` are called by passing ``prefetch_results = False`` to the search method. | |
38 | ||
39 | Pass ``num_results`` to the search method to set the maximum number of results. | |
40 | ||
41 | ``SearchReponse.total`` gives the total number of results on Google. | |
42 | ||
43 | Credits | |
44 | --------- | |
45 | ||
46 | This package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template. | |
47 | ||
48 | .. _Cookiecutter: https://github.com/audreyr/cookiecutter | |
49 | .. _`audreyr/cookiecutter-pypackage`: https://github.com/audreyr/cookiecutter-pypackage | |
50 |
0 | python-googlesearch (2.0.3+git20210326.1.e2d3e74-0kali1) UNRELEASED; urgency=low | |
1 | ||
2 | * New upstream snapshot. | |
3 | ||
4 | -- Kali Janitor <[email protected]> Fri, 25 Nov 2022 03:21:04 -0000 | |
5 | ||
0 | 6 | python-googlesearch (2.0.3-0kali1) kali-dev; urgency=medium |
1 | 7 | |
2 | 8 | [ Sophie Brun ] |
0 | # Makefile for Sphinx documentation | |
1 | # | |
2 | ||
3 | # You can set these variables from the command line. | |
4 | SPHINXOPTS = | |
5 | SPHINXBUILD = sphinx-build | |
6 | PAPER = | |
7 | BUILDDIR = _build | |
8 | ||
9 | # User-friendly check for sphinx-build | |
10 | ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1) | |
11 | $(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/) | |
12 | endif | |
13 | ||
14 | # Internal variables. | |
15 | PAPEROPT_a4 = -D latex_paper_size=a4 | |
16 | PAPEROPT_letter = -D latex_paper_size=letter | |
17 | ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . | |
18 | # the i18n builder cannot share the environment and doctrees with the others | |
19 | I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . | |
20 | ||
21 | .PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext | |
22 | ||
23 | help: | |
24 | @echo "Please use \`make <target>' where <target> is one of" | |
25 | @echo " html to make standalone HTML files" | |
26 | @echo " dirhtml to make HTML files named index.html in directories" | |
27 | @echo " singlehtml to make a single large HTML file" | |
28 | @echo " pickle to make pickle files" | |
29 | @echo " json to make JSON files" | |
30 | @echo " htmlhelp to make HTML files and a HTML help project" | |
31 | @echo " qthelp to make HTML files and a qthelp project" | |
32 | @echo " devhelp to make HTML files and a Devhelp project" | |
33 | @echo " epub to make an epub" | |
34 | @echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter" | |
35 | @echo " latexpdf to make LaTeX files and run them through pdflatex" | |
36 | @echo " latexpdfja to make LaTeX files and run them through platex/dvipdfmx" | |
37 | @echo " text to make text files" | |
38 | @echo " man to make manual pages" | |
39 | @echo " texinfo to make Texinfo files" | |
40 | @echo " info to make Texinfo files and run them through makeinfo" | |
41 | @echo " gettext to make PO message catalogs" | |
42 | @echo " changes to make an overview of all changed/added/deprecated items" | |
43 | @echo " xml to make Docutils-native XML files" | |
44 | @echo " pseudoxml to make pseudoxml-XML files for display purposes" | |
45 | @echo " linkcheck to check all external links for integrity" | |
46 | @echo " doctest to run all doctests embedded in the documentation (if enabled)" | |
47 | ||
48 | clean: | |
49 | rm -rf $(BUILDDIR)/* | |
50 | ||
51 | html: | |
52 | $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html | |
53 | @echo | |
54 | @echo "Build finished. The HTML pages are in $(BUILDDIR)/html." | |
55 | ||
56 | dirhtml: | |
57 | $(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml | |
58 | @echo | |
59 | @echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml." | |
60 | ||
61 | singlehtml: | |
62 | $(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml | |
63 | @echo | |
64 | @echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml." | |
65 | ||
66 | pickle: | |
67 | $(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle | |
68 | @echo | |
69 | @echo "Build finished; now you can process the pickle files." | |
70 | ||
71 | json: | |
72 | $(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json | |
73 | @echo | |
74 | @echo "Build finished; now you can process the JSON files." | |
75 | ||
76 | htmlhelp: | |
77 | $(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp | |
78 | @echo | |
79 | @echo "Build finished; now you can run HTML Help Workshop with the" \ | |
80 | ".hhp project file in $(BUILDDIR)/htmlhelp." | |
81 | ||
82 | qthelp: | |
83 | $(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp | |
84 | @echo | |
85 | @echo "Build finished; now you can run "qcollectiongenerator" with the" \ | |
86 | ".qhcp project file in $(BUILDDIR)/qthelp, like this:" | |
87 | @echo "# qcollectiongenerator $(BUILDDIR)/qthelp/googlesearch.qhcp" | |
88 | @echo "To view the help file:" | |
89 | @echo "# assistant -collectionFile $(BUILDDIR)/qthelp/googlesearch.qhc" | |
90 | ||
91 | devhelp: | |
92 | $(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp | |
93 | @echo | |
94 | @echo "Build finished." | |
95 | @echo "To view the help file:" | |
96 | @echo "# mkdir -p $$HOME/.local/share/devhelp/googlesearch" | |
97 | @echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/googlesearch" | |
98 | @echo "# devhelp" | |
99 | ||
100 | epub: | |
101 | $(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub | |
102 | @echo | |
103 | @echo "Build finished. The epub file is in $(BUILDDIR)/epub." | |
104 | ||
105 | latex: | |
106 | $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex | |
107 | @echo | |
108 | @echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex." | |
109 | @echo "Run \`make' in that directory to run these through (pdf)latex" \ | |
110 | "(use \`make latexpdf' here to do that automatically)." | |
111 | ||
112 | latexpdf: | |
113 | $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex | |
114 | @echo "Running LaTeX files through pdflatex..." | |
115 | $(MAKE) -C $(BUILDDIR)/latex all-pdf | |
116 | @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." | |
117 | ||
118 | latexpdfja: | |
119 | $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex | |
120 | @echo "Running LaTeX files through platex and dvipdfmx..." | |
121 | $(MAKE) -C $(BUILDDIR)/latex all-pdf-ja | |
122 | @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." | |
123 | ||
124 | text: | |
125 | $(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text | |
126 | @echo | |
127 | @echo "Build finished. The text files are in $(BUILDDIR)/text." | |
128 | ||
129 | man: | |
130 | $(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man | |
131 | @echo | |
132 | @echo "Build finished. The manual pages are in $(BUILDDIR)/man." | |
133 | ||
134 | texinfo: | |
135 | $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo | |
136 | @echo | |
137 | @echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo." | |
138 | @echo "Run \`make' in that directory to run these through makeinfo" \ | |
139 | "(use \`make info' here to do that automatically)." | |
140 | ||
141 | info: | |
142 | $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo | |
143 | @echo "Running Texinfo files through makeinfo..." | |
144 | make -C $(BUILDDIR)/texinfo info | |
145 | @echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo." | |
146 | ||
147 | gettext: | |
148 | $(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale | |
149 | @echo | |
150 | @echo "Build finished. The message catalogs are in $(BUILDDIR)/locale." | |
151 | ||
152 | changes: | |
153 | $(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes | |
154 | @echo | |
155 | @echo "The overview file is in $(BUILDDIR)/changes." | |
156 | ||
157 | linkcheck: | |
158 | $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck | |
159 | @echo | |
160 | @echo "Link check complete; look for any errors in the above output " \ | |
161 | "or in $(BUILDDIR)/linkcheck/output.txt." | |
162 | ||
163 | doctest: | |
164 | $(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest | |
165 | @echo "Testing of doctests in the sources finished, look at the " \ | |
166 | "results in $(BUILDDIR)/doctest/output.txt." | |
167 | ||
168 | xml: | |
169 | $(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml | |
170 | @echo | |
171 | @echo "Build finished. The XML files are in $(BUILDDIR)/xml." | |
172 | ||
173 | pseudoxml: | |
174 | $(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml | |
175 | @echo | |
176 | @echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml." |
0 | .. include:: ../AUTHORS.rst |
0 | #!/usr/bin/env python | |
1 | # -*- coding: utf-8 -*- | |
2 | # | |
3 | # googlesearch documentation build configuration file, created by | |
4 | # sphinx-quickstart on Tue Jul 9 22:26:36 2013. | |
5 | # | |
6 | # This file is execfile()d with the current directory set to its | |
7 | # containing dir. | |
8 | # | |
9 | # Note that not all possible configuration values are present in this | |
10 | # autogenerated file. | |
11 | # | |
12 | # All configuration values have a default; values that are commented out | |
13 | # serve to show the default. | |
14 | ||
15 | import sys | |
16 | import os | |
17 | ||
18 | # If extensions (or modules to document with autodoc) are in another | |
19 | # directory, add these directories to sys.path here. If the directory is | |
20 | # relative to the documentation root, use os.path.abspath to make it | |
21 | # absolute, like shown here. | |
22 | #sys.path.insert(0, os.path.abspath('.')) | |
23 | ||
24 | # Get the project root dir, which is the parent dir of this | |
25 | cwd = os.getcwd() | |
26 | project_root = os.path.dirname(cwd) | |
27 | ||
28 | # Insert the project root dir as the first element in the PYTHONPATH. | |
29 | # This lets us ensure that the source package is imported, and that its | |
30 | # version is used. | |
31 | sys.path.insert(0, project_root) | |
32 | ||
33 | import googlesearch | |
34 | ||
35 | # -- General configuration --------------------------------------------- | |
36 | ||
37 | # If your documentation needs a minimal Sphinx version, state it here. | |
38 | #needs_sphinx = '1.0' | |
39 | ||
40 | # Add any Sphinx extension module names here, as strings. They can be | |
41 | # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom ones. | |
42 | extensions = ['sphinx.ext.autodoc', 'sphinx.ext.viewcode'] | |
43 | ||
44 | # Add any paths that contain templates here, relative to this directory. | |
45 | templates_path = ['_templates'] | |
46 | ||
47 | # The suffix of source filenames. | |
48 | source_suffix = '.rst' | |
49 | ||
50 | # The encoding of source files. | |
51 | #source_encoding = 'utf-8-sig' | |
52 | ||
53 | # The master toctree document. | |
54 | master_doc = 'index' | |
55 | ||
56 | # General information about the project. | |
57 | project = u'google-search' | |
58 | copyright = u"2017, Anthony Hseb" | |
59 | ||
60 | # The version info for the project you're documenting, acts as replacement | |
61 | # for |version| and |release|, also used in various other places throughout | |
62 | # the built documents. | |
63 | # | |
64 | # The short X.Y version. | |
65 | version = googlesearch.__version__ | |
66 | # The full version, including alpha/beta/rc tags. | |
67 | release = googlesearch.__version__ | |
68 | ||
69 | # The language for content autogenerated by Sphinx. Refer to documentation | |
70 | # for a list of supported languages. | |
71 | #language = None | |
72 | ||
73 | # There are two options for replacing |today|: either, you set today to | |
74 | # some non-false value, then it is used: | |
75 | #today = '' | |
76 | # Else, today_fmt is used as the format for a strftime call. | |
77 | #today_fmt = '%B %d, %Y' | |
78 | ||
79 | # List of patterns, relative to source directory, that match files and | |
80 | # directories to ignore when looking for source files. | |
81 | exclude_patterns = ['_build'] | |
82 | ||
83 | # The reST default role (used for this markup: `text`) to use for all | |
84 | # documents. | |
85 | #default_role = None | |
86 | ||
87 | # If true, '()' will be appended to :func: etc. cross-reference text. | |
88 | #add_function_parentheses = True | |
89 | ||
90 | # If true, the current module name will be prepended to all description | |
91 | # unit titles (such as .. function::). | |
92 | #add_module_names = True | |
93 | ||
94 | # If true, sectionauthor and moduleauthor directives will be shown in the | |
95 | # output. They are ignored by default. | |
96 | #show_authors = False | |
97 | ||
98 | # The name of the Pygments (syntax highlighting) style to use. | |
99 | pygments_style = 'sphinx' | |
100 | ||
101 | # A list of ignored prefixes for module index sorting. | |
102 | #modindex_common_prefix = [] | |
103 | ||
104 | # If true, keep warnings as "system message" paragraphs in the built | |
105 | # documents. | |
106 | #keep_warnings = False | |
107 | ||
108 | ||
109 | # -- Options for HTML output ------------------------------------------- | |
110 | ||
111 | # The theme to use for HTML and HTML Help pages. See the documentation for | |
112 | # a list of builtin themes. | |
113 | html_theme = 'default' | |
114 | ||
115 | # Theme options are theme-specific and customize the look and feel of a | |
116 | # theme further. For a list of options available for each theme, see the | |
117 | # documentation. | |
118 | #html_theme_options = {} | |
119 | ||
120 | # Add any paths that contain custom themes here, relative to this directory. | |
121 | #html_theme_path = [] | |
122 | ||
123 | # The name for this set of Sphinx documents. If None, it defaults to | |
124 | # "<project> v<release> documentation". | |
125 | #html_title = None | |
126 | ||
127 | # A shorter title for the navigation bar. Default is the same as | |
128 | # html_title. | |
129 | #html_short_title = None | |
130 | ||
131 | # The name of an image file (relative to this directory) to place at the | |
132 | # top of the sidebar. | |
133 | #html_logo = None | |
134 | ||
135 | # The name of an image file (within the static path) to use as favicon | |
136 | # of the docs. This file should be a Windows icon file (.ico) being | |
137 | # 16x16 or 32x32 pixels large. | |
138 | #html_favicon = None | |
139 | ||
140 | # Add any paths that contain custom static files (such as style sheets) | |
141 | # here, relative to this directory. They are copied after the builtin | |
142 | # static files, so a file named "default.css" will overwrite the builtin | |
143 | # "default.css". | |
144 | html_static_path = ['_static'] | |
145 | ||
146 | # If not '', a 'Last updated on:' timestamp is inserted at every page | |
147 | # bottom, using the given strftime format. | |
148 | #html_last_updated_fmt = '%b %d, %Y' | |
149 | ||
150 | # If true, SmartyPants will be used to convert quotes and dashes to | |
151 | # typographically correct entities. | |
152 | #html_use_smartypants = True | |
153 | ||
154 | # Custom sidebar templates, maps document names to template names. | |
155 | #html_sidebars = {} | |
156 | ||
157 | # Additional templates that should be rendered to pages, maps page names | |
158 | # to template names. | |
159 | #html_additional_pages = {} | |
160 | ||
161 | # If false, no module index is generated. | |
162 | #html_domain_indices = True | |
163 | ||
164 | # If false, no index is generated. | |
165 | #html_use_index = True | |
166 | ||
167 | # If true, the index is split into individual pages for each letter. | |
168 | #html_split_index = False | |
169 | ||
170 | # If true, links to the reST sources are added to the pages. | |
171 | #html_show_sourcelink = True | |
172 | ||
173 | # If true, "Created using Sphinx" is shown in the HTML footer. | |
174 | # Default is True. | |
175 | #html_show_sphinx = True | |
176 | ||
177 | # If true, "(C) Copyright ..." is shown in the HTML footer. | |
178 | # Default is True. | |
179 | #html_show_copyright = True | |
180 | ||
181 | # If true, an OpenSearch description file will be output, and all pages | |
182 | # will contain a <link> tag referring to it. The value of this option | |
183 | # must be the base URL from which the finished HTML is served. | |
184 | #html_use_opensearch = '' | |
185 | ||
186 | # This is the file name suffix for HTML files (e.g. ".xhtml"). | |
187 | #html_file_suffix = None | |
188 | ||
189 | # Output file base name for HTML help builder. | |
190 | htmlhelp_basename = 'googlesearchdoc' | |
191 | ||
192 | ||
193 | # -- Options for LaTeX output ------------------------------------------ | |
194 | ||
195 | latex_elements = { | |
196 | # The paper size ('letterpaper' or 'a4paper'). | |
197 | #'papersize': 'letterpaper', | |
198 | ||
199 | # The font size ('10pt', '11pt' or '12pt'). | |
200 | #'pointsize': '10pt', | |
201 | ||
202 | # Additional stuff for the LaTeX preamble. | |
203 | #'preamble': '', | |
204 | } | |
205 | ||
206 | # Grouping the document tree into LaTeX files. List of tuples | |
207 | # (source start file, target name, title, author, documentclass | |
208 | # [howto/manual]). | |
209 | latex_documents = [ | |
210 | ('index', 'googlesearch.tex', | |
211 | u'google-search Documentation', | |
212 | u'Anthony Hseb', 'manual'), | |
213 | ] | |
214 | ||
215 | # The name of an image file (relative to this directory) to place at | |
216 | # the top of the title page. | |
217 | #latex_logo = None | |
218 | ||
219 | # For "manual" documents, if this is true, then toplevel headings | |
220 | # are parts, not chapters. | |
221 | #latex_use_parts = False | |
222 | ||
223 | # If true, show page references after internal links. | |
224 | #latex_show_pagerefs = False | |
225 | ||
226 | # If true, show URL addresses after external links. | |
227 | #latex_show_urls = False | |
228 | ||
229 | # Documents to append as an appendix to all manuals. | |
230 | #latex_appendices = [] | |
231 | ||
232 | # If false, no module index is generated. | |
233 | #latex_domain_indices = True | |
234 | ||
235 | ||
236 | # -- Options for manual page output ------------------------------------ | |
237 | ||
238 | # One entry per manual page. List of tuples | |
239 | # (source start file, name, description, authors, manual section). | |
240 | man_pages = [ | |
241 | ('index', 'googlesearch', | |
242 | u'google-search Documentation', | |
243 | [u'Anthony Hseb'], 1) | |
244 | ] | |
245 | ||
246 | # If true, show URL addresses after external links. | |
247 | #man_show_urls = False | |
248 | ||
249 | ||
250 | # -- Options for Texinfo output ---------------------------------------- | |
251 | ||
252 | # Grouping the document tree into Texinfo files. List of tuples | |
253 | # (source start file, target name, title, author, | |
254 | # dir menu entry, description, category) | |
255 | texinfo_documents = [ | |
256 | ('index', 'googlesearch', | |
257 | u'google-search Documentation', | |
258 | u'Anthony Hseb', | |
259 | 'googlesearch', | |
260 | 'One line description of project.', | |
261 | 'Miscellaneous'), | |
262 | ] | |
263 | ||
264 | # Documents to append as an appendix to all manuals. | |
265 | #texinfo_appendices = [] | |
266 | ||
267 | # If false, no module index is generated. | |
268 | #texinfo_domain_indices = True | |
269 | ||
270 | # How to display URL addresses: 'footnote', 'no', or 'inline'. | |
271 | #texinfo_show_urls = 'footnote' | |
272 | ||
273 | # If true, do not generate a @detailmenu in the "Top" node's menu. | |
274 | #texinfo_no_detailmenu = False |
0 | .. include:: ../CONTRIBUTING.rst |
0 | .. include:: ../HISTORY.rst |
0 | Welcome to google-search's documentation! | |
1 | ====================================== | |
2 | ||
3 | Contents: | |
4 | ||
5 | .. toctree:: | |
6 | :maxdepth: 2 | |
7 | ||
8 | readme | |
9 | installation | |
10 | usage | |
11 | contributing | |
12 | authorshistory | |
13 | ||
14 | Indices and tables | |
15 | ================== | |
16 | ||
17 | * :ref:`genindex` | |
18 | * :ref:`modindex` | |
19 | * :ref:`search` |
0 | .. highlight:: shell | |
1 | ||
2 | ============ | |
3 | Installation | |
4 | ============ | |
5 | ||
6 | ||
7 | Stable release | |
8 | -------------- | |
9 | ||
10 | To install google-search, run this command in your terminal: | |
11 | ||
12 | .. code-block:: console | |
13 | ||
14 | $ pip install google-search | |
15 | ||
16 | This is the preferred method to install google-search, as it will always install the most recent stable release. | |
17 | ||
18 | If you don't have `pip`_ installed, this `Python installation guide`_ can guide | |
19 | you through the process. | |
20 | ||
21 | .. _pip: https://pip.pypa.io | |
22 | .. _Python installation guide: http://docs.python-guide.org/en/latest/starting/installation/ | |
23 | ||
24 | ||
25 | From sources | |
26 | ------------ | |
27 | ||
28 | The sources for google-search can be downloaded from the `Github repo`_. | |
29 | ||
30 | You can either clone the public repository: | |
31 | ||
32 | .. code-block:: console | |
33 | ||
34 | $ git clone git://github.com/anthonyhseb/googlesearch | |
35 | ||
36 | Or download the `tarball`_: | |
37 | ||
38 | .. code-block:: console | |
39 | ||
40 | $ curl -OL https://github.com/anthonyhseb/googlesearch/tarball/master | |
41 | ||
42 | Once you have a copy of the source, you can install it with: | |
43 | ||
44 | .. code-block:: console | |
45 | ||
46 | $ python setup.py install | |
47 | ||
48 | ||
49 | .. _Github repo: https://github.com/anthonyhseb/googlesearch | |
50 | .. _tarball: https://github.com/anthonyhseb/googlesearch/tarball/master |
0 | @ECHO OFF | |
1 | ||
2 | REM Command file for Sphinx documentation | |
3 | ||
4 | if "%SPHINXBUILD%" == "" ( | |
5 | set SPHINXBUILD=sphinx-build | |
6 | ) | |
7 | set BUILDDIR=_build | |
8 | set ALLSPHINXOPTS=-d %BUILDDIR%/doctrees %SPHINXOPTS% . | |
9 | set I18NSPHINXOPTS=%SPHINXOPTS% . | |
10 | if NOT "%PAPER%" == "" ( | |
11 | set ALLSPHINXOPTS=-D latex_paper_size=%PAPER% %ALLSPHINXOPTS% | |
12 | set I18NSPHINXOPTS=-D latex_paper_size=%PAPER% %I18NSPHINXOPTS% | |
13 | ) | |
14 | ||
15 | if "%1" == "" goto help | |
16 | ||
17 | if "%1" == "help" ( | |
18 | :help | |
19 | echo.Please use `make ^<target^>` where ^<target^> is one of | |
20 | echo. html to make standalone HTML files | |
21 | echo. dirhtml to make HTML files named index.html in directories | |
22 | echo. singlehtml to make a single large HTML file | |
23 | echo. pickle to make pickle files | |
24 | echo. json to make JSON files | |
25 | echo. htmlhelp to make HTML files and a HTML help project | |
26 | echo. qthelp to make HTML files and a qthelp project | |
27 | echo. devhelp to make HTML files and a Devhelp project | |
28 | echo. epub to make an epub | |
29 | echo. latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter | |
30 | echo. text to make text files | |
31 | echo. man to make manual pages | |
32 | echo. texinfo to make Texinfo files | |
33 | echo. gettext to make PO message catalogs | |
34 | echo. changes to make an overview over all changed/added/deprecated items | |
35 | echo. xml to make Docutils-native XML files | |
36 | echo. pseudoxml to make pseudoxml-XML files for display purposes | |
37 | echo. linkcheck to check all external links for integrity | |
38 | echo. doctest to run all doctests embedded in the documentation if enabled | |
39 | goto end | |
40 | ) | |
41 | ||
42 | if "%1" == "clean" ( | |
43 | for /d %%i in (%BUILDDIR%\*) do rmdir /q /s %%i | |
44 | del /q /s %BUILDDIR%\* | |
45 | goto end | |
46 | ) | |
47 | ||
48 | ||
49 | %SPHINXBUILD% 2> nul | |
50 | if errorlevel 9009 ( | |
51 | echo. | |
52 | echo.The 'sphinx-build' command was not found. Make sure you have Sphinx | |
53 | echo.installed, then set the SPHINXBUILD environment variable to point | |
54 | echo.to the full path of the 'sphinx-build' executable. Alternatively you | |
55 | echo.may add the Sphinx directory to PATH. | |
56 | echo. | |
57 | echo.If you don't have Sphinx installed, grab it from | |
58 | echo.http://sphinx-doc.org/ | |
59 | exit /b 1 | |
60 | ) | |
61 | ||
62 | if "%1" == "html" ( | |
63 | %SPHINXBUILD% -b html %ALLSPHINXOPTS% %BUILDDIR%/html | |
64 | if errorlevel 1 exit /b 1 | |
65 | echo. | |
66 | echo.Build finished. The HTML pages are in %BUILDDIR%/html. | |
67 | goto end | |
68 | ) | |
69 | ||
70 | if "%1" == "dirhtml" ( | |
71 | %SPHINXBUILD% -b dirhtml %ALLSPHINXOPTS% %BUILDDIR%/dirhtml | |
72 | if errorlevel 1 exit /b 1 | |
73 | echo. | |
74 | echo.Build finished. The HTML pages are in %BUILDDIR%/dirhtml. | |
75 | goto end | |
76 | ) | |
77 | ||
78 | if "%1" == "singlehtml" ( | |
79 | %SPHINXBUILD% -b singlehtml %ALLSPHINXOPTS% %BUILDDIR%/singlehtml | |
80 | if errorlevel 1 exit /b 1 | |
81 | echo. | |
82 | echo.Build finished. The HTML pages are in %BUILDDIR%/singlehtml. | |
83 | goto end | |
84 | ) | |
85 | ||
86 | if "%1" == "pickle" ( | |
87 | %SPHINXBUILD% -b pickle %ALLSPHINXOPTS% %BUILDDIR%/pickle | |
88 | if errorlevel 1 exit /b 1 | |
89 | echo. | |
90 | echo.Build finished; now you can process the pickle files. | |
91 | goto end | |
92 | ) | |
93 | ||
94 | if "%1" == "json" ( | |
95 | %SPHINXBUILD% -b json %ALLSPHINXOPTS% %BUILDDIR%/json | |
96 | if errorlevel 1 exit /b 1 | |
97 | echo. | |
98 | echo.Build finished; now you can process the JSON files. | |
99 | goto end | |
100 | ) | |
101 | ||
102 | if "%1" == "htmlhelp" ( | |
103 | %SPHINXBUILD% -b htmlhelp %ALLSPHINXOPTS% %BUILDDIR%/htmlhelp | |
104 | if errorlevel 1 exit /b 1 | |
105 | echo. | |
106 | echo.Build finished; now you can run HTML Help Workshop with the ^ | |
107 | .hhp project file in %BUILDDIR%/htmlhelp. | |
108 | goto end | |
109 | ) | |
110 | ||
111 | if "%1" == "qthelp" ( | |
112 | %SPHINXBUILD% -b qthelp %ALLSPHINXOPTS% %BUILDDIR%/qthelp | |
113 | if errorlevel 1 exit /b 1 | |
114 | echo. | |
115 | echo.Build finished; now you can run "qcollectiongenerator" with the ^ | |
116 | .qhcp project file in %BUILDDIR%/qthelp, like this: | |
117 | echo.^> qcollectiongenerator %BUILDDIR%\qthelp\googlesearch.qhcp | |
118 | echo.To view the help file: | |
119 | echo.^> assistant -collectionFile %BUILDDIR%\qthelp\googlesearch.ghc | |
120 | goto end | |
121 | ) | |
122 | ||
123 | if "%1" == "devhelp" ( | |
124 | %SPHINXBUILD% -b devhelp %ALLSPHINXOPTS% %BUILDDIR%/devhelp | |
125 | if errorlevel 1 exit /b 1 | |
126 | echo. | |
127 | echo.Build finished. | |
128 | goto end | |
129 | ) | |
130 | ||
131 | if "%1" == "epub" ( | |
132 | %SPHINXBUILD% -b epub %ALLSPHINXOPTS% %BUILDDIR%/epub | |
133 | if errorlevel 1 exit /b 1 | |
134 | echo. | |
135 | echo.Build finished. The epub file is in %BUILDDIR%/epub. | |
136 | goto end | |
137 | ) | |
138 | ||
139 | if "%1" == "latex" ( | |
140 | %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex | |
141 | if errorlevel 1 exit /b 1 | |
142 | echo. | |
143 | echo.Build finished; the LaTeX files are in %BUILDDIR%/latex. | |
144 | goto end | |
145 | ) | |
146 | ||
147 | if "%1" == "latexpdf" ( | |
148 | %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex | |
149 | cd %BUILDDIR%/latex | |
150 | make all-pdf | |
151 | cd %BUILDDIR%/.. | |
152 | echo. | |
153 | echo.Build finished; the PDF files are in %BUILDDIR%/latex. | |
154 | goto end | |
155 | ) | |
156 | ||
157 | if "%1" == "latexpdfja" ( | |
158 | %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex | |
159 | cd %BUILDDIR%/latex | |
160 | make all-pdf-ja | |
161 | cd %BUILDDIR%/.. | |
162 | echo. | |
163 | echo.Build finished; the PDF files are in %BUILDDIR%/latex. | |
164 | goto end | |
165 | ) | |
166 | ||
167 | if "%1" == "text" ( | |
168 | %SPHINXBUILD% -b text %ALLSPHINXOPTS% %BUILDDIR%/text | |
169 | if errorlevel 1 exit /b 1 | |
170 | echo. | |
171 | echo.Build finished. The text files are in %BUILDDIR%/text. | |
172 | goto end | |
173 | ) | |
174 | ||
175 | if "%1" == "man" ( | |
176 | %SPHINXBUILD% -b man %ALLSPHINXOPTS% %BUILDDIR%/man | |
177 | if errorlevel 1 exit /b 1 | |
178 | echo. | |
179 | echo.Build finished. The manual pages are in %BUILDDIR%/man. | |
180 | goto end | |
181 | ) | |
182 | ||
183 | if "%1" == "texinfo" ( | |
184 | %SPHINXBUILD% -b texinfo %ALLSPHINXOPTS% %BUILDDIR%/texinfo | |
185 | if errorlevel 1 exit /b 1 | |
186 | echo. | |
187 | echo.Build finished. The Texinfo files are in %BUILDDIR%/texinfo. | |
188 | goto end | |
189 | ) | |
190 | ||
191 | if "%1" == "gettext" ( | |
192 | %SPHINXBUILD% -b gettext %I18NSPHINXOPTS% %BUILDDIR%/locale | |
193 | if errorlevel 1 exit /b 1 | |
194 | echo. | |
195 | echo.Build finished. The message catalogs are in %BUILDDIR%/locale. | |
196 | goto end | |
197 | ) | |
198 | ||
199 | if "%1" == "changes" ( | |
200 | %SPHINXBUILD% -b changes %ALLSPHINXOPTS% %BUILDDIR%/changes | |
201 | if errorlevel 1 exit /b 1 | |
202 | echo. | |
203 | echo.The overview file is in %BUILDDIR%/changes. | |
204 | goto end | |
205 | ) | |
206 | ||
207 | if "%1" == "linkcheck" ( | |
208 | %SPHINXBUILD% -b linkcheck %ALLSPHINXOPTS% %BUILDDIR%/linkcheck | |
209 | if errorlevel 1 exit /b 1 | |
210 | echo. | |
211 | echo.Link check complete; look for any errors in the above output ^ | |
212 | or in %BUILDDIR%/linkcheck/output.txt. | |
213 | goto end | |
214 | ) | |
215 | ||
216 | if "%1" == "doctest" ( | |
217 | %SPHINXBUILD% -b doctest %ALLSPHINXOPTS% %BUILDDIR%/doctest | |
218 | if errorlevel 1 exit /b 1 | |
219 | echo. | |
220 | echo.Testing of doctests in the sources finished, look at the ^ | |
221 | results in %BUILDDIR%/doctest/output.txt. | |
222 | goto end | |
223 | ) | |
224 | ||
225 | if "%1" == "xml" ( | |
226 | %SPHINXBUILD% -b xml %ALLSPHINXOPTS% %BUILDDIR%/xml | |
227 | if errorlevel 1 exit /b 1 | |
228 | echo. | |
229 | echo.Build finished. The XML files are in %BUILDDIR%/xml. | |
230 | goto end | |
231 | ) | |
232 | ||
233 | if "%1" == "pseudoxml" ( | |
234 | %SPHINXBUILD% -b pseudoxml %ALLSPHINXOPTS% %BUILDDIR%/pseudoxml | |
235 | if errorlevel 1 exit /b 1 | |
236 | echo. | |
237 | echo.Build finished. The pseudo-XML files are in %BUILDDIR%/pseudoxml. | |
238 | goto end | |
239 | ) | |
240 | ||
241 | :end |
0 | .. include:: ../README.rst |
0 | ===== | |
1 | Usage | |
2 | ===== | |
3 | ||
4 | To use google-search in a project:: | |
5 | ||
6 | from googlesearch.googlesearch import GoogleSearch | |
7 | response = GoogleSearch().search("something") | |
8 | for result in response.results: | |
9 | print("Title: " + result.title) | |
10 | print("URL: " + result.url) | |
11 | print("Content: " + result.getText()) | |
12 | print("Html: " + result.getMarkup()) |
0 | Metadata-Version: 1.1 | |
1 | Name: google | |
2 | Version: 2.0.3 | |
3 | Summary: Python bindings to the Google search engine. | |
4 | Home-page: http://breakingcode.wordpress.com/ | |
5 | Author: Mario Vilas | |
6 | Author-email: [email protected] | |
7 | License: UNKNOWN | |
8 | Description: UNKNOWN | |
9 | Platform: UNKNOWN | |
10 | Classifier: Development Status :: 5 - Production/Stable | |
11 | Classifier: Intended Audience :: Developers | |
12 | Classifier: License :: OSI Approved :: BSD License | |
13 | Classifier: Environment :: Console | |
14 | Classifier: Programming Language :: Python | |
15 | Classifier: Topic :: Software Development :: Libraries :: Python Modules | |
16 | Requires: beautifulsoup4 | |
17 | Provides: googlesearch |
0 | MANIFEST.in | |
1 | README.md | |
2 | requirements.txt | |
3 | setup.cfg | |
4 | setup.py | |
5 | google.egg-info/PKG-INFO | |
6 | google.egg-info/SOURCES.txt | |
7 | google.egg-info/dependency_links.txt | |
8 | google.egg-info/requires.txt | |
9 | google.egg-info/top_level.txt | |
10 | googlesearch/__init__.py | |
11 | googlesearch/user_agents.txt.gz | |
12 | scripts/google⏎ |
0 | Metadata-Version: 2.1 | |
1 | Name: google-search | |
2 | Version: 1.1.1 | |
3 | Summary: Library for scraping google search results | |
4 | Home-page: https://github.com/anthonyhseb/googlesearch | |
5 | Author: Anthony Hseb | |
6 | Author-email: [email protected] | |
7 | License: MIT license | |
8 | Keywords: googlesearch | |
9 | Classifier: Development Status :: 2 - Pre-Alpha | |
10 | Classifier: Intended Audience :: Developers | |
11 | Classifier: License :: OSI Approved :: MIT License | |
12 | Classifier: Natural Language :: English | |
13 | Classifier: Programming Language :: Python :: 2 | |
14 | Classifier: Programming Language :: Python :: 2.7 | |
15 | Classifier: Programming Language :: Python :: 3 | |
16 | Classifier: Programming Language :: Python :: 3.6 | |
17 | Classifier: Programming Language :: Python :: 3.8 | |
18 | License-File: LICENSE | |
19 | License-File: AUTHORS.rst | |
20 | ||
21 | ============= | |
22 | google-search | |
23 | ============= | |
24 | ||
25 | ||
26 | .. image:: https://img.shields.io/pypi/v/google-search.svg | |
27 | :target: https://pypi.python.org/pypi/google-search | |
28 | ||
29 | .. image:: https://img.shields.io/travis/anthonyhseb/googlesearch.svg | |
30 | :target: https://travis-ci.org/anthonyhseb/googlesearch | |
31 | ||
32 | .. image:: https://readthedocs.org/projects/googlesearch/badge/?version=latest | |
33 | :target: https://googlesearch.readthedocs.io/en/latest/?badge=latest | |
34 | :alt: Documentation Status | |
35 | ||
36 | .. image:: https://pyup.io/repos/github/anthonyhseb/googlesearch/shield.svg | |
37 | :target: https://pyup.io/repos/github/anthonyhseb/googlesearch/ | |
38 | :alt: Updates | |
39 | ||
40 | ||
41 | Library for scraping google search results. | |
42 | ||
43 | * Usage:: | |
44 | ||
45 | from googlesearch.googlesearch import GoogleSearch | |
46 | response = GoogleSearch().search("something") | |
47 | for result in response.results: | |
48 | print("Title: " + result.title) | |
49 | print("Content: " + result.getText()) | |
50 | ||
51 | ||
52 | ||
53 | * Free software: MIT license | |
54 | ||
55 | Features | |
56 | -------- | |
57 | ||
58 | Run a Google search and fetch the individual results (full HTML and text contents). By default the result URLs are fetched eagerly when the search request is made with 10 parallel requests. Fetching can be deferred until ``searchResult.getText()`` or ``getMarkup()`` are called by passing ``prefetch_results = False`` to the search method. | |
59 | ||
60 | Pass ``num_results`` to the search method to set the maximum number of results. | |
61 | ||
62 | ``SearchReponse.total`` gives the total number of results on Google. | |
63 | ||
64 | Credits | |
65 | --------- | |
66 | ||
67 | This package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template. | |
68 | ||
69 | .. _Cookiecutter: https://github.com/audreyr/cookiecutter | |
70 | .. _`audreyr/cookiecutter-pypackage`: https://github.com/audreyr/cookiecutter-pypackage | |
71 | ||
72 | ||
73 | ||
74 | ======= | |
75 | History | |
76 | ======= | |
77 | ||
78 | 1.0.0 (2017-05-06) | |
79 | ------------------ | |
80 | ||
81 | * First release on PyPI. |
0 | AUTHORS.rst | |
1 | CONTRIBUTING.rst | |
2 | HISTORY.rst | |
3 | LICENSE | |
4 | MANIFEST.in | |
5 | README.rst | |
6 | setup.cfg | |
7 | setup.py | |
8 | docs/Makefile | |
9 | docs/authors.rst | |
10 | docs/conf.py | |
11 | docs/contributing.rst | |
12 | docs/history.rst | |
13 | docs/index.rst | |
14 | docs/installation.rst | |
15 | docs/make.bat | |
16 | docs/readme.rst | |
17 | docs/usage.rst | |
18 | google_search.egg-info/PKG-INFO | |
19 | google_search.egg-info/SOURCES.txt | |
20 | google_search.egg-info/dependency_links.txt | |
21 | google_search.egg-info/not-zip-safe | |
22 | google_search.egg-info/requires.txt | |
23 | google_search.egg-info/top_level.txt | |
24 | googlesearch/__init__.py | |
25 | googlesearch/browser_agents.txt | |
26 | googlesearch/googlesearch.py | |
27 | tests/__init__.py | |
28 | tests/test_googlesearch.py⏎ |
0 | googlesearch |
0 | #!/usr/bin/env python | |
1 | ||
2 | # Python bindings to the Google search engine | |
3 | # Copyright (c) 2009-2019, Mario Vilas | |
4 | # All rights reserved. | |
5 | # | |
6 | # Redistribution and use in source and binary forms, with or without | |
7 | # modification, are permitted provided that the following conditions are met: | |
8 | # | |
9 | # * Redistributions of source code must retain the above copyright notice, | |
10 | # this list of conditions and the following disclaimer. | |
11 | # * Redistributions in binary form must reproduce the above copyright | |
12 | # notice,this list of conditions and the following disclaimer in the | |
13 | # documentation and/or other materials provided with the distribution. | |
14 | # * Neither the name of the copyright holder nor the names of its | |
15 | # contributors may be used to endorse or promote products derived from | |
16 | # this software without specific prior written permission. | |
17 | # | |
18 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" | |
19 | # AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE | |
20 | # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE | |
21 | # ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE | |
22 | # LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR | |
23 | # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF | |
24 | # SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS | |
25 | # INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN | |
26 | # CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) | |
27 | # ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE | |
28 | # POSSIBILITY OF SUCH DAMAGE. | |
29 | ||
30 | import os | |
31 | import random | |
32 | import sys | |
33 | import time | |
34 | ||
35 | if sys.version_info[0] > 2: | |
36 | from http.cookiejar import LWPCookieJar | |
37 | from urllib.request import Request, urlopen | |
38 | from urllib.parse import quote_plus, urlparse, parse_qs | |
39 | else: | |
40 | from cookielib import LWPCookieJar | |
41 | from urllib import quote_plus | |
42 | from urllib2 import Request, urlopen | |
43 | from urlparse import urlparse, parse_qs | |
44 | ||
45 | try: | |
46 | from bs4 import BeautifulSoup | |
47 | is_bs4 = True | |
48 | except ImportError: | |
49 | from BeautifulSoup import BeautifulSoup | |
50 | is_bs4 = False | |
51 | ||
52 | __all__ = [ | |
53 | ||
54 | # Main search function. | |
55 | 'search', | |
56 | ||
57 | # Specialized search functions. | |
58 | 'search_images', 'search_news', | |
59 | 'search_videos', 'search_shop', | |
60 | 'search_books', 'search_apps', | |
61 | ||
62 | # Shortcut for "get lucky" search. | |
63 | 'lucky', | |
64 | ||
65 | # Miscellaneous utility functions. | |
66 | 'get_random_user_agent', 'get_tbs', | |
67 | ] | |
68 | ||
69 | # URL templates to make Google searches. | |
70 | url_home = "https://www.google.%(tld)s/" | |
71 | url_search = "https://www.google.%(tld)s/search?hl=%(lang)s&q=%(query)s&" \ | |
72 | "btnG=Google+Search&tbs=%(tbs)s&safe=%(safe)s&tbm=%(tpe)s&" \ | |
73 | "cr=%(country)s" | |
74 | url_next_page = "https://www.google.%(tld)s/search?hl=%(lang)s&q=%(query)s&" \ | |
75 | "start=%(start)d&tbs=%(tbs)s&safe=%(safe)s&tbm=%(tpe)s&" \ | |
76 | "cr=%(country)s" | |
77 | url_search_num = "https://www.google.%(tld)s/search?hl=%(lang)s&q=%(query)s&" \ | |
78 | "num=%(num)d&btnG=Google+Search&tbs=%(tbs)s&safe=%(safe)s&" \ | |
79 | "tbm=%(tpe)s&cr=%(country)s" | |
80 | url_next_page_num = "https://www.google.%(tld)s/search?hl=%(lang)s&" \ | |
81 | "q=%(query)s&num=%(num)d&start=%(start)d&tbs=%(tbs)s&" \ | |
82 | "safe=%(safe)s&tbm=%(tpe)s&cr=%(country)s" | |
83 | url_parameters = ( | |
84 | 'hl', 'q', 'num', 'btnG', 'start', 'tbs', 'safe', 'tbm', 'cr') | |
85 | ||
86 | # Cookie jar. Stored at the user's home folder. | |
87 | # If the cookie jar is inaccessible, the errors are ignored. | |
88 | home_folder = os.getenv('HOME') | |
89 | if not home_folder: | |
90 | home_folder = os.getenv('USERHOME') | |
91 | if not home_folder: | |
92 | home_folder = '.' # Use the current folder on error. | |
93 | cookie_jar = LWPCookieJar(os.path.join(home_folder, '.google-cookie')) | |
94 | try: | |
95 | cookie_jar.load() | |
96 | except Exception: | |
97 | pass | |
98 | ||
99 | # Default user agent, unless instructed by the user to change it. | |
100 | USER_AGENT = 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)' | |
101 | ||
102 | # Load the list of valid user agents from the install folder. | |
103 | # The search order is: | |
104 | # * user_agents.txt.gz | |
105 | # * user_agents.txt | |
106 | # * default user agent | |
107 | try: | |
108 | install_folder = os.path.abspath(os.path.split(__file__)[0]) | |
109 | try: | |
110 | user_agents_file = os.path.join(install_folder, 'user_agents.txt.gz') | |
111 | import gzip | |
112 | fp = gzip.open(user_agents_file, 'rb') | |
113 | try: | |
114 | user_agents_list = [_.strip() for _ in fp.readlines()] | |
115 | finally: | |
116 | fp.close() | |
117 | del fp | |
118 | except Exception: | |
119 | user_agents_file = os.path.join(install_folder, 'user_agents.txt') | |
120 | with open(user_agents_file) as fp: | |
121 | user_agents_list = [_.strip() for _ in fp.readlines()] | |
122 | except Exception: | |
123 | user_agents_list = [USER_AGENT] | |
124 | ||
125 | ||
126 | # Get a random user agent. | |
127 | def get_random_user_agent(): | |
128 | """ | |
129 | Get a random user agent string. | |
130 | ||
131 | :rtype: str | |
132 | :return: Random user agent string. | |
133 | """ | |
134 | return random.choice(user_agents_list) | |
135 | ||
136 | ||
137 | # Helper function to format the tbs parameter. | |
138 | def get_tbs(from_date, to_date): | |
139 | """ | |
140 | Helper function to format the tbs parameter. | |
141 | ||
142 | :param datetime.date from_date: Python date object. | |
143 | :param datetime.date to_date: Python date object. | |
144 | ||
145 | :rtype: str | |
146 | :return: Dates encoded in tbs format. | |
147 | """ | |
148 | from_date = from_date.strftime('%m/%d/%Y') | |
149 | to_date = to_date.strftime('%m/%d/%Y') | |
150 | return 'cdr:1,cd_min:%(from_date)s,cd_max:%(to_date)s' % vars() | |
151 | ||
152 | ||
153 | # Request the given URL and return the response page, using the cookie jar. | |
154 | # If the cookie jar is inaccessible, the errors are ignored. | |
155 | def get_page(url, user_agent=None): | |
156 | """ | |
157 | Request the given URL and return the response page, using the cookie jar. | |
158 | ||
159 | :param str url: URL to retrieve. | |
160 | :param str user_agent: User agent for the HTTP requests. | |
161 | Use None for the default. | |
162 | ||
163 | :rtype: str | |
164 | :return: Web page retrieved for the given URL. | |
165 | ||
166 | :raises IOError: An exception is raised on error. | |
167 | :raises urllib2.URLError: An exception is raised on error. | |
168 | :raises urllib2.HTTPError: An exception is raised on error. | |
169 | """ | |
170 | if user_agent is None: | |
171 | user_agent = USER_AGENT | |
172 | request = Request(url) | |
173 | request.add_header('User-Agent', user_agent) | |
174 | cookie_jar.add_cookie_header(request) | |
175 | response = urlopen(request) | |
176 | cookie_jar.extract_cookies(response, request) | |
177 | html = response.read() | |
178 | response.close() | |
179 | try: | |
180 | cookie_jar.save() | |
181 | except Exception: | |
182 | pass | |
183 | return html | |
184 | ||
185 | ||
186 | # Filter links found in the Google result pages HTML code. | |
187 | # Returns None if the link doesn't yield a valid result. | |
188 | def filter_result(link): | |
189 | try: | |
190 | ||
191 | # Decode hidden URLs. | |
192 | if link.startswith('/url?'): | |
193 | o = urlparse(link, 'http') | |
194 | link = parse_qs(o.query)['q'][0] | |
195 | ||
196 | # Valid results are absolute URLs not pointing to a Google domain, | |
197 | # like images.google.com or googleusercontent.com for example. | |
198 | # TODO this could be improved! | |
199 | o = urlparse(link, 'http') | |
200 | if o.netloc and 'google' not in o.netloc: | |
201 | return link | |
202 | ||
203 | # On error, return None. | |
204 | except Exception: | |
205 | pass | |
206 | ||
207 | ||
208 | # Returns a generator that yields URLs. | |
209 | def search(query, tld='com', lang='en', tbs='0', safe='off', num=10, start=0, | |
210 | stop=None, domains=None, pause=2.0, tpe='', country='', | |
211 | extra_params=None, user_agent=None): | |
212 | """ | |
213 | Search the given query string using Google. | |
214 | ||
215 | :param str query: Query string. Must NOT be url-encoded. | |
216 | :param str tld: Top level domain. | |
217 | :param str lang: Language. | |
218 | :param str tbs: Time limits (i.e "qdr:h" => last hour, | |
219 | "qdr:d" => last 24 hours, "qdr:m" => last month). | |
220 | :param str safe: Safe search. | |
221 | :param int num: Number of results per page. | |
222 | :param int start: First result to retrieve. | |
223 | :param int stop: Last result to retrieve. | |
224 | Use None to keep searching forever. | |
225 | :param list domains: A list of web domains to constrain | |
226 | the search. | |
227 | :param float pause: Lapse to wait between HTTP requests. | |
228 | A lapse too long will make the search slow, but a lapse too short may | |
229 | cause Google to block your IP. Your mileage may vary! | |
230 | :param str tpe: Search type (images, videos, news, shopping, books, apps) | |
231 | Use the following values {videos: 'vid', images: 'isch', | |
232 | news: 'nws', shopping: 'shop', books: 'bks', applications: 'app'} | |
233 | :param str country: Country or region to focus the search on. Similar to | |
234 | changing the TLD, but does not yield exactly the same results. | |
235 | Only Google knows why... | |
236 | :param dict extra_params: A dictionary of extra HTTP GET | |
237 | parameters, which must be URL encoded. For example if you don't want | |
238 | Google to filter similar results you can set the extra_params to | |
239 | {'filter': '0'} which will append '&filter=0' to every query. | |
240 | :param str user_agent: User agent for the HTTP requests. | |
241 | Use None for the default. | |
242 | ||
243 | :rtype: generator of str | |
244 | :return: Generator (iterator) that yields found URLs. | |
245 | If the stop parameter is None the iterator will loop forever. | |
246 | """ | |
247 | # Set of hashes for the results found. | |
248 | # This is used to avoid repeated results. | |
249 | hashes = set() | |
250 | ||
251 | # Count the number of links yielded. | |
252 | count = 0 | |
253 | ||
254 | # Prepare domain list if it exists. | |
255 | if domains: | |
256 | query = query + ' ' + ' OR '.join( | |
257 | 'site:' + domain for domain in domains) | |
258 | ||
259 | # Prepare the search string. | |
260 | query = quote_plus(query) | |
261 | ||
262 | # If no extra_params is given, create an empty dictionary. | |
263 | # We should avoid using an empty dictionary as a default value | |
264 | # in a function parameter in Python. | |
265 | if not extra_params: | |
266 | extra_params = {} | |
267 | ||
268 | # Check extra_params for overlapping. | |
269 | for builtin_param in url_parameters: | |
270 | if builtin_param in extra_params.keys(): | |
271 | raise ValueError( | |
272 | 'GET parameter "%s" is overlapping with \ | |
273 | the built-in GET parameter', | |
274 | builtin_param | |
275 | ) | |
276 | ||
277 | # Grab the cookie from the home page. | |
278 | get_page(url_home % vars(), user_agent) | |
279 | ||
280 | # Prepare the URL of the first request. | |
281 | if start: | |
282 | if num == 10: | |
283 | url = url_next_page % vars() | |
284 | else: | |
285 | url = url_next_page_num % vars() | |
286 | else: | |
287 | if num == 10: | |
288 | url = url_search % vars() | |
289 | else: | |
290 | url = url_search_num % vars() | |
291 | ||
292 | # Loop until we reach the maximum result, if any (otherwise, loop forever). | |
293 | while not stop or count < stop: | |
294 | ||
295 | # Remeber last count to detect the end of results. | |
296 | last_count = count | |
297 | ||
298 | # Append extra GET parameters to the URL. | |
299 | # This is done on every iteration because we're | |
300 | # rebuilding the entire URL at the end of this loop. | |
301 | for k, v in extra_params.items(): | |
302 | k = quote_plus(k) | |
303 | v = quote_plus(v) | |
304 | url = url + ('&%s=%s' % (k, v)) | |
305 | ||
306 | # Sleep between requests. | |
307 | # Keeps Google from banning you for making too many requests. | |
308 | time.sleep(pause) | |
309 | ||
310 | # Request the Google Search results page. | |
311 | html = get_page(url, user_agent) | |
312 | ||
313 | # Parse the response and get every anchored URL. | |
314 | if is_bs4: | |
315 | soup = BeautifulSoup(html, 'html.parser') | |
316 | else: | |
317 | soup = BeautifulSoup(html) | |
318 | try: | |
319 | anchors = soup.find(id='search').findAll('a') | |
320 | # Sometimes (depending on the User-agent) there is | |
321 | # no id "search" in html response... | |
322 | except AttributeError: | |
323 | # Remove links of the top bar. | |
324 | gbar = soup.find(id='gbar') | |
325 | if gbar: | |
326 | gbar.clear() | |
327 | anchors = soup.findAll('a') | |
328 | ||
329 | # Process every anchored URL. | |
330 | for a in anchors: | |
331 | ||
332 | # Get the URL from the anchor tag. | |
333 | try: | |
334 | link = a['href'] | |
335 | except KeyError: | |
336 | continue | |
337 | ||
338 | # Filter invalid links and links pointing to Google itself. | |
339 | link = filter_result(link) | |
340 | if not link: | |
341 | continue | |
342 | ||
343 | # Discard repeated results. | |
344 | h = hash(link) | |
345 | if h in hashes: | |
346 | continue | |
347 | hashes.add(h) | |
348 | ||
349 | # Yield the result. | |
350 | yield link | |
351 | ||
352 | # Increase the results counter. | |
353 | # If we reached the limit, stop. | |
354 | count += 1 | |
355 | if stop and count >= stop: | |
356 | return | |
357 | ||
358 | # End if there are no more results. | |
359 | # XXX TODO review this logic, not sure if this is still true! | |
360 | if last_count == count: | |
361 | break | |
362 | ||
363 | # Prepare the URL for the next request. | |
364 | start += num | |
365 | if num == 10: | |
366 | url = url_next_page % vars() | |
367 | else: | |
368 | url = url_next_page_num % vars() | |
369 | ||
370 | ||
371 | # Shortcut to search images. | |
372 | # Beware, this does not return the image link. | |
373 | def search_images(*args, **kwargs): | |
374 | """ | |
375 | Shortcut to search images. | |
376 | ||
377 | Same arguments and return value as the main search function. | |
378 | ||
379 | :note: Beware, this does not return the image link. | |
380 | """ | |
381 | kwargs['tpe'] = 'isch' | |
382 | return search(*args, **kwargs) | |
383 | ||
384 | ||
385 | # Shortcut to search news. | |
386 | def search_news(*args, **kwargs): | |
387 | """ | |
388 | Shortcut to search news. | |
389 | ||
390 | Same arguments and return value as the main search function. | |
391 | """ | |
392 | kwargs['tpe'] = 'nws' | |
393 | return search(*args, **kwargs) | |
394 | ||
395 | ||
396 | # Shortcut to search videos. | |
397 | def search_videos(*args, **kwargs): | |
398 | """ | |
399 | Shortcut to search videos. | |
400 | ||
401 | Same arguments and return value as the main search function. | |
402 | """ | |
403 | kwargs['tpe'] = 'vid' | |
404 | return search(*args, **kwargs) | |
405 | ||
406 | ||
407 | # Shortcut to search shop. | |
408 | def search_shop(*args, **kwargs): | |
409 | """ | |
410 | Shortcut to search shop. | |
411 | ||
412 | Same arguments and return value as the main search function. | |
413 | """ | |
414 | kwargs['tpe'] = 'shop' | |
415 | return search(*args, **kwargs) | |
416 | ||
417 | ||
418 | # Shortcut to search books. | |
419 | def search_books(*args, **kwargs): | |
420 | """ | |
421 | Shortcut to search books. | |
422 | ||
423 | Same arguments and return value as the main search function. | |
424 | """ | |
425 | kwargs['tpe'] = 'bks' | |
426 | return search(*args, **kwargs) | |
427 | ||
428 | ||
429 | # Shortcut to search apps. | |
430 | def search_apps(*args, **kwargs): | |
431 | """ | |
432 | Shortcut to search apps. | |
433 | ||
434 | Same arguments and return value as the main search function. | |
435 | """ | |
436 | kwargs['tpe'] = 'app' | |
437 | return search(*args, **kwargs) | |
438 | ||
439 | ||
440 | # Shortcut to single-item search. | |
441 | # Evaluates the iterator to return the single URL as a string. | |
442 | def lucky(*args, **kwargs): | |
443 | """ | |
444 | Shortcut to single-item search. | |
445 | ||
446 | Same arguments as the main search function, but the return value changes. | |
447 | ||
448 | :rtype: str | |
449 | :return: URL found by Google. | |
450 | """ | |
451 | return next(search(*args, **kwargs)) | |
0 | __version__ = "1.0.0" |
0 | Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/ 58.0.3029.81 Safari/537.36 | |
1 | Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36 | |
2 | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36 | |
3 | Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36 | |
4 | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36 | |
5 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0 | |
6 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36 | |
7 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_1) AppleWebKit/604.3.5 (KHTML, like Gecko) Version/11.0.1 Safari/604.3.5 | |
8 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36 | |
9 | Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36 | |
10 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36 | |
11 | Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0 | |
12 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0 | |
13 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/604.3.5 (KHTML, like Gecko) Version/11.0.1 Safari/604.3.5 | |
14 | Mozilla/5.0 (X11; Linux x86_64; rv:57.0) Gecko/20100101 Firefox/57.0 | |
15 | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.89 Safari/537.36 | |
16 | Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko | |
17 | Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36 | |
18 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36 | |
19 | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36 Edge/15.15063 | |
20 | Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36 | |
21 | Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36 | |
22 | Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0 | |
23 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:57.0) Gecko/20100101 Firefox/57.0 | |
24 | Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0 | |
25 | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 Edge/16.16299 | |
26 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:57.0) Gecko/20100101 Firefox/57.0 | |
27 | Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:52.0) Gecko/20100101 Firefox/52.0 | |
28 | Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.89 Safari/537.36 | |
29 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36 | |
30 | Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko | |
31 | Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.89 Safari/537.36 | |
32 | Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0 | |
33 | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.89 Safari/537.36 OPR/49.0.2725.47 | |
34 | Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:57.0) Gecko/20100101 Firefox/57.0 | |
35 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36 | |
36 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/604.3.5 (KHTML, like Gecko) Version/11.0.1 Safari/604.3.5 | |
37 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.89 Safari/537.36 | |
38 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:56.0) Gecko/20100101 Firefox/56.0 | |
39 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/600.5.17 (KHTML, like Gecko) Version/8.0.5 Safari/600.5.17 | |
40 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Safari/604.1.38 | |
41 | Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36 | |
42 | Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0 | |
43 | Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36 | |
44 | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36 | |
45 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Safari/604.1.38 | |
46 | Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0 | |
47 | Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36 | |
48 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:57.0) Gecko/20100101 Firefox/57.0 | |
49 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:56.0) Gecko/20100101 Firefox/56.0 | |
50 | Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36 | |
51 | Mozilla/5.0 (Windows NT 6.1; rv:57.0) Gecko/20100101 Firefox/57.0 | |
52 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36 | |
53 | Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0 | |
54 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.89 Safari/537.36 | |
55 | Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36 | |
56 | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.89 Safari/537.36 OPR/49.0.2725.39 | |
57 | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:58.0) Gecko/20100101 Firefox/58.0 | |
58 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/603.3.8 (KHTML, like Gecko) Version/10.1.2 Safari/603.3.8 | |
59 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/603.3.8 (KHTML, like Gecko) Version/10.1.2 Safari/603.3.8 | |
60 | Mozilla/5.0 (Windows NT 6.1; rv:52.0) Gecko/20100101 Firefox/52.0 | |
61 | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.79 Safari/537.36 Edge/14.14393 | |
62 | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36 OPR/48.0.2685.52 | |
63 | Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36 | |
64 | Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/62.0.3202.75 Chrome/62.0.3202.75 Safari/537.36 | |
65 | Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36 | |
66 | Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36 | |
67 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36 | |
68 | Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36 | |
69 | Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.62 Safari/537.36 | |
70 | Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/62.0.3202.94 Chrome/62.0.3202.94 Safari/537.36 | |
71 | Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0; Trident/5.0) | |
72 | Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; Touch; rv:11.0) like Gecko | |
73 | Mozilla/5.0 (X11; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0 | |
74 | Mozilla/5.0 (X11; Linux x86_64; rv:58.0) Gecko/20100101 Firefox/58.0 | |
75 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36 | |
76 | Mozilla/5.0 (iPad; CPU OS 11_1_2 like Mac OS X) AppleWebKit/604.3.5 (KHTML, like Gecko) Version/11.0 Mobile/15B202 Safari/604.1 | |
77 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36 | |
78 | Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko | |
79 | Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0; Trident/5.0) | |
80 | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36 | |
81 | Mozilla/5.0 (Windows NT 6.1; rv:56.0) Gecko/20100101 Firefox/56.0 | |
82 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36 | |
83 | Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.62 Safari/537.36 | |
84 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/603.2.4 (KHTML, like Gecko) Version/10.1.1 Safari/603.2.4 | |
85 | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0 | |
86 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:56.0) Gecko/20100101 Firefox/56.0 | |
87 | Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/62.0.3202.89 Chrome/62.0.3202.89 Safari/537.36 | |
88 | Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36 | |
89 | Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36 | |
90 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.89 Safari/537.36 | |
91 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36 | |
92 | Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36 | |
93 | Mozilla/5.0 (Windows NT 5.1; rv:52.0) Gecko/20100101 Firefox/52.0 | |
94 | Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36,⏎ |
0 | # Contributors: | |
1 | # https://github.com/anthonyhseb | |
2 | # https://github.com/rakeshsagalagatte | |
3 | # https://github.com/hildogjr | |
4 | ||
5 | import sys | |
6 | if sys.version_info[0] > 2: | |
7 | import urllib.request as urllib | |
8 | else: | |
9 | import urllib2 as urllib | |
10 | import math | |
11 | import re | |
12 | from bs4 import BeautifulSoup | |
13 | from multiprocessing.pool import ThreadPool # To deal with the parallel scrape. | |
14 | from random import choice | |
15 | from time import sleep | |
16 | from pkg_resources import resource_filename | |
17 | from contextlib import closing | |
18 | ||
19 | class GoogleSearch: | |
20 | with open(resource_filename('googlesearch', 'browser_agents.txt'), 'r') as file_handle: | |
21 | USER_AGENTS = file_handle.read().splitlines() | |
22 | SEARCH_URL = "https://google.com/search" | |
23 | RESULT_SELECTOR = "div.g" | |
24 | RESULT_SELECTOR_PAGE1 = "div.g>div>div[id][data-ved]" | |
25 | TOTAL_SELECTOR = "#result-stats" | |
26 | RESULTS_PER_PAGE = 10 | |
27 | DEFAULT_HEADERS = [ | |
28 | ('User-Agent', choice(USER_AGENTS)), | |
29 | ("Accept-Language", "en-US,en;q=0.5"), | |
30 | ] | |
31 | ||
32 | def search(self, | |
33 | query, | |
34 | num_results = 10, | |
35 | prefetch_pages = True, | |
36 | num_prefetch_threads = 10): | |
37 | '''Perform the Google search. | |
38 | ||
39 | Parameters: | |
40 | String to search. | |
41 | Minimum number of result to stop search. | |
42 | Prefetch answered pages. | |
43 | Number of threads used t prefetch the pages. | |
44 | Time between thread executions in second to void IP block. | |
45 | ''' | |
46 | search_results = [] | |
47 | pages = int(math.ceil(num_results / float(GoogleSearch.RESULTS_PER_PAGE))) | |
48 | total = None | |
49 | thread_pool = None | |
50 | if prefetch_pages: | |
51 | thread_pool = ThreadPool(num_prefetch_threads) | |
52 | for i in range(pages) : | |
53 | start = i * GoogleSearch.RESULTS_PER_PAGE | |
54 | opener = urllib.build_opener() | |
55 | opener.addheaders = GoogleSearch.DEFAULT_HEADERS | |
56 | with closing(opener.open(GoogleSearch.SEARCH_URL + | |
57 | "?hl=en&q="+ urllib.quote(query) + | |
58 | ("" if start == 0 else | |
59 | ("&start=" + str(start))))) as response: | |
60 | soup = BeautifulSoup(response.read(), "lxml") | |
61 | if total is None: | |
62 | if sys.version_info[0] > 2: | |
63 | totalText = soup.select(GoogleSearch.TOTAL_SELECTOR)[0].children.__next__() | |
64 | else: | |
65 | totalText = soup.select(GoogleSearch.TOTAL_SELECTOR)[0].children.next() | |
66 | total = int(re.sub("[', ]", "", | |
67 | re.search("(([0-9]+[', ])*[0-9]+)", | |
68 | totalText).group(1))) | |
69 | selector = GoogleSearch.RESULT_SELECTOR_PAGE1 if i == 0 else GoogleSearch.RESULT_SELECTOR | |
70 | self.results = self.parse_results(soup.select(selector), i) | |
71 | # if len(search_results) + len(self.results) > num_results: | |
72 | # del self.results[num_results - len(search_results):] | |
73 | search_results += self.results | |
74 | if prefetch_pages: | |
75 | thread_pool.map_async(SearchResult.get_text, self.results) | |
76 | if prefetch_pages: | |
77 | thread_pool.close() | |
78 | thread_pool.join() | |
79 | return SearchResponse(search_results, total) | |
80 | ||
81 | def parse_results(self, results, page): | |
82 | search_results = [] | |
83 | for result in results: | |
84 | if page == 0: | |
85 | result = result.parent | |
86 | else: | |
87 | result = result.find("div") | |
88 | h3 = result.find("h3") | |
89 | if h3 is None: | |
90 | continue | |
91 | url = h3.parent["href"] | |
92 | title = h3.text | |
93 | search_results.append(SearchResult(title, url)) | |
94 | return search_results | |
95 | ||
96 | class SearchResponse: | |
97 | def __init__(self, results, total): | |
98 | self.results = results | |
99 | self.total = total | |
100 | ||
101 | class SearchResult: | |
102 | def __init__(self, title, url): | |
103 | self.title = title | |
104 | self.url = url | |
105 | self.__text = None | |
106 | self.__markup = None | |
107 | ||
108 | def get_text(self): | |
109 | if self.__text is None: | |
110 | soup = BeautifulSoup(self.get_markup(), "lxml") | |
111 | for junk in soup(['style', 'script', 'head', 'title', 'meta']): | |
112 | junk.extract() | |
113 | self.__text = soup.get_text() | |
114 | return self.__text | |
115 | ||
116 | def get_markup(self): | |
117 | if self.__markup is None: | |
118 | opener = urllib.build_opener() | |
119 | opener.addheaders = GoogleSearch.DEFAULT_HEADERS | |
120 | response = opener.open(self.url) | |
121 | self.__markup = response.read() | |
122 | return self.__markup | |
123 | ||
124 | def __str__(self): | |
125 | return str(self.__dict__) | |
126 | def __unicode__(self): | |
127 | return str(self.__str__()) | |
128 | def __repr__(self): | |
129 | return self.__str__() | |
130 | ||
131 | ||
132 | # Main entry for test and external script use. | |
133 | if __name__ == "__main__": | |
134 | import sys | |
135 | if len(sys.argv) == 1: # Only the file name. | |
136 | query = "python" | |
137 | else: | |
138 | query = " ".join(sys.argv[1:]) | |
139 | search = GoogleSearch() | |
140 | num_results = 10 | |
141 | print ("Fetching first " + str(num_results) + " results for \"" + query + "\"...") | |
142 | response = search.search(query, num_results, prefetch_pages=True) | |
143 | print ("TOTAL: " + str(response.total) + " RESULTS") | |
144 | for count, result in enumerate(response.results): | |
145 | print("RESULT #" + str (count+1) + ":") | |
146 | print((result._SearchResult__text.strip() | |
147 | if result._SearchResult__text is not None else "[None]") + "\n\n") |
0 | #!/usr/bin/env python | |
1 | ||
2 | # Python bindings to the Google search engine | |
3 | # Copyright (c) 2009-2019, Mario Vilas | |
4 | # All rights reserved. | |
5 | # | |
6 | # Redistribution and use in source and binary forms, with or without | |
7 | # modification, are permitted provided that the following conditions are met: | |
8 | # | |
9 | # * Redistributions of source code must retain the above copyright notice, | |
10 | # this list of conditions and the following disclaimer. | |
11 | # * Redistributions in binary form must reproduce the above copyright | |
12 | # notice,this list of conditions and the following disclaimer in the | |
13 | # documentation and/or other materials provided with the distribution. | |
14 | # * Neither the name of the copyright holder nor the names of its | |
15 | # contributors may be used to endorse or promote products derived from | |
16 | # this software without specific prior written permission. | |
17 | # | |
18 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" | |
19 | # AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE | |
20 | # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE | |
21 | # ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE | |
22 | # LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR | |
23 | # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF | |
24 | # SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS | |
25 | # INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN | |
26 | # CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) | |
27 | # ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE | |
28 | # POSSIBILITY OF SUCH DAMAGE. | |
29 | ||
30 | import sys | |
31 | ||
32 | from googlesearch import search, get_random_user_agent | |
33 | ||
34 | # TODO port to argparse | |
35 | from optparse import OptionParser, IndentedHelpFormatter | |
36 | ||
37 | ||
38 | class BannerHelpFormatter(IndentedHelpFormatter): | |
39 | ||
40 | "Just a small tweak to optparse to be able to print a banner." | |
41 | ||
42 | def __init__(self, banner, *argv, **argd): | |
43 | self.banner = banner | |
44 | IndentedHelpFormatter.__init__(self, *argv, **argd) | |
45 | ||
46 | def format_usage(self, usage): | |
47 | msg = IndentedHelpFormatter.format_usage(self, usage) | |
48 | return '%s\n%s' % (self.banner, msg) | |
49 | ||
50 | ||
51 | def main(): | |
52 | ||
53 | # Parse the command line arguments. | |
54 | formatter = BannerHelpFormatter( | |
55 | "Python script to use the Google search engine\n" | |
56 | "By Mario Vilas (mvilas at gmail dot com)\n" | |
57 | "https://github.com/MarioVilas/googlesearch\n" | |
58 | ) | |
59 | parser = OptionParser(formatter=formatter) | |
60 | parser.set_usage("%prog [options] query") | |
61 | parser.add_option( | |
62 | '--tld', metavar='TLD', type='string', default='com', | |
63 | help="top level domain to use [default: com]") | |
64 | parser.add_option( | |
65 | '--lang', metavar='LANGUAGE', type='string', default='en', | |
66 | help="produce results in the given language [default: en]") | |
67 | parser.add_option( | |
68 | '--domains', metavar='DOMAINS', type='string', default='', | |
69 | help="comma separated list of domains to constrain the search to") | |
70 | parser.add_option( | |
71 | '--tbs', metavar='TBS', type='string', default='0', | |
72 | help="produce results from period [default: 0]") | |
73 | parser.add_option( | |
74 | '--safe', metavar='SAFE', type='string', default='off', | |
75 | help="kids safe search [default: off]") | |
76 | parser.add_option( | |
77 | '--type', metavar='TYPE', type='string', default='search', dest='tpe', | |
78 | help="search type (search, images, videos, news, shopping, books," | |
79 | " apps) [default: search]") | |
80 | parser.add_option( | |
81 | '--country', metavar='COUNTRY', type='string', default='', | |
82 | help="region to restrict search on [default: not restricted]") | |
83 | parser.add_option( | |
84 | '--num', metavar='NUMBER', type='int', default=10, | |
85 | help="number of results per page [default: 10]") | |
86 | parser.add_option( | |
87 | '--start', metavar='NUMBER', type='int', default=0, | |
88 | help="first result to retrieve [default: 0]") | |
89 | parser.add_option( | |
90 | '--stop', metavar='NUMBER', type='int', default=0, | |
91 | help="last result to retrieve [default: unlimited]") | |
92 | parser.add_option( | |
93 | '--pause', metavar='SECONDS', type='float', default=2.0, | |
94 | help="pause between HTTP requests [default: 2.0]") | |
95 | parser.add_option( | |
96 | '--rua', metavar='USERAGENT', action='store_true', default=False, | |
97 | help="Randomize the User-Agent [default: no]") | |
98 | (options, args) = parser.parse_args() | |
99 | query = ' '.join(args) | |
100 | if not query: | |
101 | parser.print_help() | |
102 | sys.exit(2) | |
103 | params = [ | |
104 | (k, v) for (k, v) in options.__dict__.items() | |
105 | if not k.startswith('_')] | |
106 | params = dict(params) | |
107 | ||
108 | # Split the comma separated list of domains, if present. | |
109 | if 'domains' in params: | |
110 | params['domains'] = [x.strip() for x in params['domains'].split(',')] | |
111 | ||
112 | # Use a special search type if requested. | |
113 | if 'tpe' in params: | |
114 | tpe = params['tpe'] | |
115 | if tpe and tpe not in ( | |
116 | 'search', 'images', 'videos', 'news', | |
117 | 'shopping', 'books', 'apps'): | |
118 | parser.error("invalid type: %r" % tpe) | |
119 | if tpe == 'search': | |
120 | params['tpe'] = '' | |
121 | ||
122 | # Randomize the user agent if requested. | |
123 | if 'rua' in params and params.pop('rua'): | |
124 | params['user_agent'] = get_random_user_agent() | |
125 | ||
126 | # Run the query. | |
127 | for url in search(query, **params): | |
128 | print(url) | |
129 | try: | |
130 | sys.stdout.flush() | |
131 | except Exception: | |
132 | pass | |
133 | ||
134 | ||
135 | if __name__ == '__main__': | |
136 | main() |
0 | [bumpversion] | |
1 | current_version = 1.1.1 | |
2 | commit = True | |
3 | tag = True | |
4 | ||
5 | [bumpversion:file:setup.py] | |
6 | search = version='{current_version}' | |
7 | replace = version='{new_version}' | |
8 | ||
9 | [bumpversion:file:googlesearch/__init__.py] | |
10 | search = __version__ = '{current_version}' | |
11 | replace = __version__ = '{new_version}' | |
12 | ||
0 | 13 | [bdist_wheel] |
1 | 14 | universal = 1 |
2 | 15 |
0 | 0 | #!/usr/bin/env python |
1 | # -*- coding: utf-8 -*- | |
1 | 2 | |
2 | # Copyright (c) 2009-2019, Mario Vilas | |
3 | # All rights reserved. | |
4 | # | |
5 | # Redistribution and use in source and binary forms, with or without | |
6 | # modification, are permitted provided that the following conditions are met: | |
7 | # | |
8 | # * Redistributions of source code must retain the above copyright notice, | |
9 | # this list of conditions and the following disclaimer. | |
10 | # * Redistributions in binary form must reproduce the above copyright | |
11 | # notice,this list of conditions and the following disclaimer in the | |
12 | # documentation and/or other materials provided with the distribution. | |
13 | # * Neither the name of the copyright holder nor the names of its | |
14 | # contributors may be used to endorse or promote products derived from | |
15 | # this software without specific prior written permission. | |
16 | # | |
17 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" | |
18 | # AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE | |
19 | # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE | |
20 | # ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE | |
21 | # LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR | |
22 | # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF | |
23 | # SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS | |
24 | # INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN | |
25 | # CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) | |
26 | # ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE | |
27 | # POSSIBILITY OF SUCH DAMAGE. | |
3 | from setuptools import setup | |
28 | 4 | |
29 | from os import chdir | |
30 | from os.path import abspath, join, split | |
5 | with open('README.rst') as readme_file: | |
6 | readme = readme_file.read() | |
31 | 7 | |
32 | # Make sure we are standing in the correct directory. | |
33 | # Old versions of distutils didn't take care of this. | |
34 | here = split(abspath(__file__))[0] | |
35 | chdir(here) | |
8 | with open('HISTORY.rst') as history_file: | |
9 | history = history_file.read() | |
36 | 10 | |
37 | # Package metadata. | |
38 | metadata = dict( | |
39 | name='google', | |
40 | provides=['googlesearch'], | |
41 | requires=['beautifulsoup4'], | |
42 | packages=['googlesearch'], | |
43 | scripts=[join('scripts', 'google')], | |
44 | package_data={'googlesearch': ['user_agents.txt.gz']}, | |
11 | requirements = [ | |
12 | 'beautifulsoup4', | |
13 | 'lxml', | |
14 | 'soupsieve' | |
15 | ] | |
16 | ||
17 | test_requirements = [ | |
18 | ] | |
19 | ||
20 | setup( | |
21 | name='google-search', | |
22 | version='1.1.1', | |
23 | description="Library for scraping google search results", | |
24 | long_description=readme + '\n\n' + history, | |
25 | author="Anthony Hseb", | |
26 | author_email='[email protected]', | |
27 | url='https://github.com/anthonyhseb/googlesearch', | |
28 | packages=[ | |
29 | 'googlesearch', | |
30 | ], | |
31 | package_dir={'googlesearch': | |
32 | 'googlesearch'}, | |
45 | 33 | include_package_data=True, |
46 | version="2.0.3", | |
47 | description="Python bindings to the Google search engine.", | |
48 | author="Mario Vilas", | |
49 | author_email="[email protected]", | |
50 | url="http://breakingcode.wordpress.com/", | |
34 | install_requires=requirements, | |
35 | license="MIT license", | |
36 | zip_safe=False, | |
37 | keywords='googlesearch', | |
51 | 38 | classifiers=[ |
52 | "Development Status :: 5 - Production/Stable", | |
53 | "Intended Audience :: Developers", | |
54 | "License :: OSI Approved :: BSD License", | |
55 | "Environment :: Console", | |
56 | "Programming Language :: Python", | |
57 | "Topic :: Software Development :: Libraries :: Python Modules", | |
58 | ], | |
39 | 'Development Status :: 2 - Pre-Alpha', | |
40 | 'Intended Audience :: Developers', | |
41 | 'License :: OSI Approved :: MIT License', | |
42 | 'Natural Language :: English', | |
43 | "Programming Language :: Python :: 2", | |
44 | 'Programming Language :: Python :: 2.7', | |
45 | 'Programming Language :: Python :: 3', | |
46 | 'Programming Language :: Python :: 3.6', | |
47 | 'Programming Language :: Python :: 3.8', | |
48 | ], | |
49 | test_suite='tests', | |
50 | tests_require=test_requirements | |
59 | 51 | ) |
60 | ||
61 | # Prefer setuptools over the old distutils. | |
62 | # If setuptools is available, use install_requires. | |
63 | try: | |
64 | from setuptools import setup | |
65 | metadata['install_requires'] = metadata['requires'] | |
66 | except ImportError: | |
67 | from distutils.core import setup | |
68 | """ | |
69 | # Get the long description from the readme file. | |
70 | try: | |
71 | metadata['long_description'] = open(join(here, 'README.md'), 'rU').read() | |
72 | except Exception: | |
73 | pass | |
74 | ||
75 | # If twine is installed, set the long description content type. | |
76 | try: | |
77 | import twine | |
78 | metadata['long_description_content_type'] = 'text/markdown' | |
79 | except ImportError: | |
80 | pass | |
81 | """ | |
82 | # Run the setup script. | |
83 | setup(**metadata) |
0 | ''' | |
1 | Created on May 6, 2017 | |
2 | ||
3 | @author: anthony | |
4 | ''' | |
5 | import unittest | |
6 | from googlesearch.googlesearch import GoogleSearch | |
7 | ||
8 | class TestGoolgeSearch(unittest.TestCase): | |
9 | ||
10 | def test_search(self): | |
11 | num_results = 15 | |
12 | min_results = 11 | |
13 | max_results = 20 | |
14 | response = GoogleSearch().search("unittest", num_results = num_results) | |
15 | self.assertTrue(response.total > 1000, "repsonse.total is way too low") | |
16 | self.assertTrue(len(response.results) >= min_results, "number of results is " + str(len(response.results)) + ", expected at least " + str(min_results)) | |
17 | self.assertTrue(len(response.results) <= max_results, "number of results is " + str(len(response.results)) + ", expected at most " + str(max_results)) | |
18 | for result in response.results: | |
19 | self.assertTrue(result.url is not None, "result.url is None") | |
20 | self.assertTrue(result.url.startswith("http"), "result.url is invalid: " + result.url) | |
21 | for result in response.results: | |
22 | self.assertTrue(result.get_text() is not None, "result.text is None") | |
23 | ||
24 | if __name__ == '__main__': | |
25 | unittest.main() |