Codebase list python-googlesearch / 7775a6e
New upstream snapshot. Kali Janitor 1 year, 4 months ago
40 changed file(s) with 1522 addition(s) and 742 deletion(s). Raw diff Collapse all Expand all
0 =======
1 Credits
2 =======
3
4 Developer
5 ---------
6
7 * https://github.com/anthonyhseb
8
9 Contributors
10 ------------
11
12 * https://github.com/rakeshsagalagatte
13 * https://github.com/hildogjr
0 .. highlight:: shell
1
2 ============
3 Contributing
4 ============
5
6 Contributions are welcome, and they are greatly appreciated! Every
7 little bit helps, and credit will always be given.
8
9 You can contribute in many ways:
10
11 Types of Contributions
12 ----------------------
13
14 Report Bugs
15 ~~~~~~~~~~~
16
17 Report bugs at https://github.com/anthonyhseb/googlesearch/issues.
18
19 If you are reporting a bug, please include:
20
21 * Your operating system name and version.
22 * Any details about your local setup that might be helpful in troubleshooting.
23 * Detailed steps to reproduce the bug.
24
25 Fix Bugs
26 ~~~~~~~~
27
28 Look through the GitHub issues for bugs. Anything tagged with "bug"
29 and "help wanted" is open to whoever wants to implement it.
30
31 Implement Features
32 ~~~~~~~~~~~~~~~~~~
33
34 Look through the GitHub issues for features. Anything tagged with "enhancement"
35 and "help wanted" is open to whoever wants to implement it.
36
37 Write Documentation
38 ~~~~~~~~~~~~~~~~~~~
39
40 google-search could always use more documentation, whether as part of the
41 official google-search docs, in docstrings, or even on the web in blog posts,
42 articles, and such.
43
44 Submit Feedback
45 ~~~~~~~~~~~~~~~
46
47 The best way to send feedback is to file an issue at https://github.com/anthonyhseb/googlesearch/issues.
48
49 If you are proposing a feature:
50
51 * Explain in detail how it would work.
52 * Keep the scope as narrow as possible, to make it easier to implement.
53 * Remember that this is a volunteer-driven project, and that contributions
54 are welcome :)
55
56 Get Started!
57 ------------
58
59 Ready to contribute? Here's how to set up `googlesearch` for local development.
60
61 1. Fork the `googlesearch` repo on GitHub.
62 2. Clone your fork locally::
63
64 $ git clone [email protected]:your_name_here/googlesearch.git
65
66 3. Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up your fork for local development::
67
68 $ mkvirtualenv googlesearch
69 $ cd googlesearch/
70 $ python setup.py develop
71
72 4. Create a branch for local development::
73
74 $ git checkout -b name-of-your-bugfix-or-feature
75
76 Now you can make your changes locally.
77
78 5. When you're done making changes, check that your changes pass the tests, including testing other Python versions with tox::
79
80 $ python setup.py test or py.test
81 $ tox
82
83 6. Commit your changes and push your branch to GitHub::
84
85 $ git add .
86 $ git commit -m "Your detailed description of your changes."
87 $ git push origin name-of-your-bugfix-or-feature
88
89 7. Submit a pull request through the GitHub website.
90
91 Pull Request Guidelines
92 -----------------------
93
94 Before you submit a pull request, check that it meets these guidelines:
95
96 1. The pull request should include tests.
97 2. If the pull request adds functionality, the docs should be updated. Put
98 your new functionality into a function with a docstring, and add the
99 feature to the list in README.rst.
100 3. The pull request should work for Python 2.6 and 2.7, and for PyPy. Check
101 https://travis-ci.org/anthonyhseb/googlesearch/pull_requests
102 and make sure that the tests pass for all supported Python versions.
103
104 Tips
105 ----
106
107 To run a subset of tests::
108
109
110 $ python -m unittest tests.test_googlesearch
0 =======
1 History
2 =======
3
4 1.0.0 (2017-05-06)
5 ------------------
6
7 * First release on PyPI.
0
1 MIT License
2
3 Copyright (c) 2017, Anthony Hseb
4
5 Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
6
7 The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
8
9 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
10
0 include README.md
1 include MANIFEST.in
2 include setup.py
3 include scripts/google
4 include requirements.txt
5 include googlesearch/user_agents.txt.gz
0
1 include AUTHORS.rst
2
3 include CONTRIBUTING.rst
4 include HISTORY.rst
5 include LICENSE
6 include README.rst
7 include googlesearch/browser_agents.txt
8
9 recursive-include tests *
10 recursive-exclude * __pycache__
11 recursive-exclude * *.py[co]
12
13 recursive-include docs *.rst conf.py Makefile make.bat *.jpg *.png *.gif
0 Metadata-Version: 1.1
1 Name: google
2 Version: 2.0.3
3 Summary: Python bindings to the Google search engine.
4 Home-page: http://breakingcode.wordpress.com/
5 Author: Mario Vilas
6 Author-email: [email protected]
7 License: UNKNOWN
8 Description: UNKNOWN
9 Platform: UNKNOWN
10 Classifier: Development Status :: 5 - Production/Stable
0 Metadata-Version: 2.1
1 Name: google-search
2 Version: 1.1.1
3 Summary: Library for scraping google search results
4 Home-page: https://github.com/anthonyhseb/googlesearch
5 Author: Anthony Hseb
6 Author-email: [email protected]
7 License: MIT license
8 Keywords: googlesearch
9 Classifier: Development Status :: 2 - Pre-Alpha
1110 Classifier: Intended Audience :: Developers
12 Classifier: License :: OSI Approved :: BSD License
13 Classifier: Environment :: Console
14 Classifier: Programming Language :: Python
15 Classifier: Topic :: Software Development :: Libraries :: Python Modules
16 Requires: beautifulsoup4
17 Provides: googlesearch
11 Classifier: License :: OSI Approved :: MIT License
12 Classifier: Natural Language :: English
13 Classifier: Programming Language :: Python :: 2
14 Classifier: Programming Language :: Python :: 2.7
15 Classifier: Programming Language :: Python :: 3
16 Classifier: Programming Language :: Python :: 3.6
17 Classifier: Programming Language :: Python :: 3.8
18 License-File: LICENSE
19 License-File: AUTHORS.rst
20
21 =============
22 google-search
23 =============
24
25
26 .. image:: https://img.shields.io/pypi/v/google-search.svg
27 :target: https://pypi.python.org/pypi/google-search
28
29 .. image:: https://img.shields.io/travis/anthonyhseb/googlesearch.svg
30 :target: https://travis-ci.org/anthonyhseb/googlesearch
31
32 .. image:: https://readthedocs.org/projects/googlesearch/badge/?version=latest
33 :target: https://googlesearch.readthedocs.io/en/latest/?badge=latest
34 :alt: Documentation Status
35
36 .. image:: https://pyup.io/repos/github/anthonyhseb/googlesearch/shield.svg
37 :target: https://pyup.io/repos/github/anthonyhseb/googlesearch/
38 :alt: Updates
39
40
41 Library for scraping google search results.
42
43 * Usage::
44
45 from googlesearch.googlesearch import GoogleSearch
46 response = GoogleSearch().search("something")
47 for result in response.results:
48 print("Title: " + result.title)
49 print("Content: " + result.getText())
50
51
52
53 * Free software: MIT license
54
55 Features
56 --------
57
58 Run a Google search and fetch the individual results (full HTML and text contents). By default the result URLs are fetched eagerly when the search request is made with 10 parallel requests. Fetching can be deferred until ``searchResult.getText()`` or ``getMarkup()`` are called by passing ``prefetch_results = False`` to the search method.
59
60 Pass ``num_results`` to the search method to set the maximum number of results.
61
62 ``SearchReponse.total`` gives the total number of results on Google.
63
64 Credits
65 ---------
66
67 This package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template.
68
69 .. _Cookiecutter: https://github.com/audreyr/cookiecutter
70 .. _`audreyr/cookiecutter-pypackage`: https://github.com/audreyr/cookiecutter-pypackage
71
72
73
74 =======
75 History
76 =======
77
78 1.0.0 (2017-05-06)
79 ------------------
80
81 * First release on PyPI.
+0
-19
README.md less more
0 googlesearch
1 ============
2
3 Google search from Python.
4
5 https://python-googlesearch.readthedocs.io/en/latest/
6
7 Usage example
8 -------------
9
10 # Get the first 20 hits for: "Breaking Code" WordPress blog
11 from googlesearch import search
12 for url in search('"Breaking Code" WordPress blog', stop=20):
13 print(url)
14
15 Installing
16 ----------
17
18 pip install google
0 =============
1 google-search
2 =============
3
4
5 .. image:: https://img.shields.io/pypi/v/google-search.svg
6 :target: https://pypi.python.org/pypi/google-search
7
8 .. image:: https://img.shields.io/travis/anthonyhseb/googlesearch.svg
9 :target: https://travis-ci.org/anthonyhseb/googlesearch
10
11 .. image:: https://readthedocs.org/projects/googlesearch/badge/?version=latest
12 :target: https://googlesearch.readthedocs.io/en/latest/?badge=latest
13 :alt: Documentation Status
14
15 .. image:: https://pyup.io/repos/github/anthonyhseb/googlesearch/shield.svg
16 :target: https://pyup.io/repos/github/anthonyhseb/googlesearch/
17 :alt: Updates
18
19
20 Library for scraping google search results.
21
22 * Usage::
23
24 from googlesearch.googlesearch import GoogleSearch
25 response = GoogleSearch().search("something")
26 for result in response.results:
27 print("Title: " + result.title)
28 print("Content: " + result.getText())
29
30
31
32 * Free software: MIT license
33
34 Features
35 --------
36
37 Run a Google search and fetch the individual results (full HTML and text contents). By default the result URLs are fetched eagerly when the search request is made with 10 parallel requests. Fetching can be deferred until ``searchResult.getText()`` or ``getMarkup()`` are called by passing ``prefetch_results = False`` to the search method.
38
39 Pass ``num_results`` to the search method to set the maximum number of results.
40
41 ``SearchReponse.total`` gives the total number of results on Google.
42
43 Credits
44 ---------
45
46 This package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template.
47
48 .. _Cookiecutter: https://github.com/audreyr/cookiecutter
49 .. _`audreyr/cookiecutter-pypackage`: https://github.com/audreyr/cookiecutter-pypackage
50
0 python-googlesearch (2.0.3+git20210326.1.e2d3e74-0kali1) UNRELEASED; urgency=low
1
2 * New upstream snapshot.
3
4 -- Kali Janitor <[email protected]> Thu, 29 Dec 2022 09:28:17 -0000
5
06 python-googlesearch (2.0.3-0kali1) kali-dev; urgency=medium
17
28 [ Sophie Brun ]
0 # Makefile for Sphinx documentation
1 #
2
3 # You can set these variables from the command line.
4 SPHINXOPTS =
5 SPHINXBUILD = sphinx-build
6 PAPER =
7 BUILDDIR = _build
8
9 # User-friendly check for sphinx-build
10 ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1)
11 $(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/)
12 endif
13
14 # Internal variables.
15 PAPEROPT_a4 = -D latex_paper_size=a4
16 PAPEROPT_letter = -D latex_paper_size=letter
17 ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
18 # the i18n builder cannot share the environment and doctrees with the others
19 I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
20
21 .PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext
22
23 help:
24 @echo "Please use \`make <target>' where <target> is one of"
25 @echo " html to make standalone HTML files"
26 @echo " dirhtml to make HTML files named index.html in directories"
27 @echo " singlehtml to make a single large HTML file"
28 @echo " pickle to make pickle files"
29 @echo " json to make JSON files"
30 @echo " htmlhelp to make HTML files and a HTML help project"
31 @echo " qthelp to make HTML files and a qthelp project"
32 @echo " devhelp to make HTML files and a Devhelp project"
33 @echo " epub to make an epub"
34 @echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
35 @echo " latexpdf to make LaTeX files and run them through pdflatex"
36 @echo " latexpdfja to make LaTeX files and run them through platex/dvipdfmx"
37 @echo " text to make text files"
38 @echo " man to make manual pages"
39 @echo " texinfo to make Texinfo files"
40 @echo " info to make Texinfo files and run them through makeinfo"
41 @echo " gettext to make PO message catalogs"
42 @echo " changes to make an overview of all changed/added/deprecated items"
43 @echo " xml to make Docutils-native XML files"
44 @echo " pseudoxml to make pseudoxml-XML files for display purposes"
45 @echo " linkcheck to check all external links for integrity"
46 @echo " doctest to run all doctests embedded in the documentation (if enabled)"
47
48 clean:
49 rm -rf $(BUILDDIR)/*
50
51 html:
52 $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
53 @echo
54 @echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
55
56 dirhtml:
57 $(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
58 @echo
59 @echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."
60
61 singlehtml:
62 $(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
63 @echo
64 @echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."
65
66 pickle:
67 $(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
68 @echo
69 @echo "Build finished; now you can process the pickle files."
70
71 json:
72 $(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
73 @echo
74 @echo "Build finished; now you can process the JSON files."
75
76 htmlhelp:
77 $(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
78 @echo
79 @echo "Build finished; now you can run HTML Help Workshop with the" \
80 ".hhp project file in $(BUILDDIR)/htmlhelp."
81
82 qthelp:
83 $(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
84 @echo
85 @echo "Build finished; now you can run "qcollectiongenerator" with the" \
86 ".qhcp project file in $(BUILDDIR)/qthelp, like this:"
87 @echo "# qcollectiongenerator $(BUILDDIR)/qthelp/googlesearch.qhcp"
88 @echo "To view the help file:"
89 @echo "# assistant -collectionFile $(BUILDDIR)/qthelp/googlesearch.qhc"
90
91 devhelp:
92 $(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
93 @echo
94 @echo "Build finished."
95 @echo "To view the help file:"
96 @echo "# mkdir -p $$HOME/.local/share/devhelp/googlesearch"
97 @echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/googlesearch"
98 @echo "# devhelp"
99
100 epub:
101 $(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
102 @echo
103 @echo "Build finished. The epub file is in $(BUILDDIR)/epub."
104
105 latex:
106 $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
107 @echo
108 @echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
109 @echo "Run \`make' in that directory to run these through (pdf)latex" \
110 "(use \`make latexpdf' here to do that automatically)."
111
112 latexpdf:
113 $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
114 @echo "Running LaTeX files through pdflatex..."
115 $(MAKE) -C $(BUILDDIR)/latex all-pdf
116 @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
117
118 latexpdfja:
119 $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
120 @echo "Running LaTeX files through platex and dvipdfmx..."
121 $(MAKE) -C $(BUILDDIR)/latex all-pdf-ja
122 @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
123
124 text:
125 $(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
126 @echo
127 @echo "Build finished. The text files are in $(BUILDDIR)/text."
128
129 man:
130 $(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
131 @echo
132 @echo "Build finished. The manual pages are in $(BUILDDIR)/man."
133
134 texinfo:
135 $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
136 @echo
137 @echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo."
138 @echo "Run \`make' in that directory to run these through makeinfo" \
139 "(use \`make info' here to do that automatically)."
140
141 info:
142 $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
143 @echo "Running Texinfo files through makeinfo..."
144 make -C $(BUILDDIR)/texinfo info
145 @echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo."
146
147 gettext:
148 $(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale
149 @echo
150 @echo "Build finished. The message catalogs are in $(BUILDDIR)/locale."
151
152 changes:
153 $(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
154 @echo
155 @echo "The overview file is in $(BUILDDIR)/changes."
156
157 linkcheck:
158 $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
159 @echo
160 @echo "Link check complete; look for any errors in the above output " \
161 "or in $(BUILDDIR)/linkcheck/output.txt."
162
163 doctest:
164 $(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
165 @echo "Testing of doctests in the sources finished, look at the " \
166 "results in $(BUILDDIR)/doctest/output.txt."
167
168 xml:
169 $(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml
170 @echo
171 @echo "Build finished. The XML files are in $(BUILDDIR)/xml."
172
173 pseudoxml:
174 $(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml
175 @echo
176 @echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml."
0 .. include:: ../AUTHORS.rst
0 #!/usr/bin/env python
1 # -*- coding: utf-8 -*-
2 #
3 # googlesearch documentation build configuration file, created by
4 # sphinx-quickstart on Tue Jul 9 22:26:36 2013.
5 #
6 # This file is execfile()d with the current directory set to its
7 # containing dir.
8 #
9 # Note that not all possible configuration values are present in this
10 # autogenerated file.
11 #
12 # All configuration values have a default; values that are commented out
13 # serve to show the default.
14
15 import sys
16 import os
17
18 # If extensions (or modules to document with autodoc) are in another
19 # directory, add these directories to sys.path here. If the directory is
20 # relative to the documentation root, use os.path.abspath to make it
21 # absolute, like shown here.
22 #sys.path.insert(0, os.path.abspath('.'))
23
24 # Get the project root dir, which is the parent dir of this
25 cwd = os.getcwd()
26 project_root = os.path.dirname(cwd)
27
28 # Insert the project root dir as the first element in the PYTHONPATH.
29 # This lets us ensure that the source package is imported, and that its
30 # version is used.
31 sys.path.insert(0, project_root)
32
33 import googlesearch
34
35 # -- General configuration ---------------------------------------------
36
37 # If your documentation needs a minimal Sphinx version, state it here.
38 #needs_sphinx = '1.0'
39
40 # Add any Sphinx extension module names here, as strings. They can be
41 # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
42 extensions = ['sphinx.ext.autodoc', 'sphinx.ext.viewcode']
43
44 # Add any paths that contain templates here, relative to this directory.
45 templates_path = ['_templates']
46
47 # The suffix of source filenames.
48 source_suffix = '.rst'
49
50 # The encoding of source files.
51 #source_encoding = 'utf-8-sig'
52
53 # The master toctree document.
54 master_doc = 'index'
55
56 # General information about the project.
57 project = u'google-search'
58 copyright = u"2017, Anthony Hseb"
59
60 # The version info for the project you're documenting, acts as replacement
61 # for |version| and |release|, also used in various other places throughout
62 # the built documents.
63 #
64 # The short X.Y version.
65 version = googlesearch.__version__
66 # The full version, including alpha/beta/rc tags.
67 release = googlesearch.__version__
68
69 # The language for content autogenerated by Sphinx. Refer to documentation
70 # for a list of supported languages.
71 #language = None
72
73 # There are two options for replacing |today|: either, you set today to
74 # some non-false value, then it is used:
75 #today = ''
76 # Else, today_fmt is used as the format for a strftime call.
77 #today_fmt = '%B %d, %Y'
78
79 # List of patterns, relative to source directory, that match files and
80 # directories to ignore when looking for source files.
81 exclude_patterns = ['_build']
82
83 # The reST default role (used for this markup: `text`) to use for all
84 # documents.
85 #default_role = None
86
87 # If true, '()' will be appended to :func: etc. cross-reference text.
88 #add_function_parentheses = True
89
90 # If true, the current module name will be prepended to all description
91 # unit titles (such as .. function::).
92 #add_module_names = True
93
94 # If true, sectionauthor and moduleauthor directives will be shown in the
95 # output. They are ignored by default.
96 #show_authors = False
97
98 # The name of the Pygments (syntax highlighting) style to use.
99 pygments_style = 'sphinx'
100
101 # A list of ignored prefixes for module index sorting.
102 #modindex_common_prefix = []
103
104 # If true, keep warnings as "system message" paragraphs in the built
105 # documents.
106 #keep_warnings = False
107
108
109 # -- Options for HTML output -------------------------------------------
110
111 # The theme to use for HTML and HTML Help pages. See the documentation for
112 # a list of builtin themes.
113 html_theme = 'default'
114
115 # Theme options are theme-specific and customize the look and feel of a
116 # theme further. For a list of options available for each theme, see the
117 # documentation.
118 #html_theme_options = {}
119
120 # Add any paths that contain custom themes here, relative to this directory.
121 #html_theme_path = []
122
123 # The name for this set of Sphinx documents. If None, it defaults to
124 # "<project> v<release> documentation".
125 #html_title = None
126
127 # A shorter title for the navigation bar. Default is the same as
128 # html_title.
129 #html_short_title = None
130
131 # The name of an image file (relative to this directory) to place at the
132 # top of the sidebar.
133 #html_logo = None
134
135 # The name of an image file (within the static path) to use as favicon
136 # of the docs. This file should be a Windows icon file (.ico) being
137 # 16x16 or 32x32 pixels large.
138 #html_favicon = None
139
140 # Add any paths that contain custom static files (such as style sheets)
141 # here, relative to this directory. They are copied after the builtin
142 # static files, so a file named "default.css" will overwrite the builtin
143 # "default.css".
144 html_static_path = ['_static']
145
146 # If not '', a 'Last updated on:' timestamp is inserted at every page
147 # bottom, using the given strftime format.
148 #html_last_updated_fmt = '%b %d, %Y'
149
150 # If true, SmartyPants will be used to convert quotes and dashes to
151 # typographically correct entities.
152 #html_use_smartypants = True
153
154 # Custom sidebar templates, maps document names to template names.
155 #html_sidebars = {}
156
157 # Additional templates that should be rendered to pages, maps page names
158 # to template names.
159 #html_additional_pages = {}
160
161 # If false, no module index is generated.
162 #html_domain_indices = True
163
164 # If false, no index is generated.
165 #html_use_index = True
166
167 # If true, the index is split into individual pages for each letter.
168 #html_split_index = False
169
170 # If true, links to the reST sources are added to the pages.
171 #html_show_sourcelink = True
172
173 # If true, "Created using Sphinx" is shown in the HTML footer.
174 # Default is True.
175 #html_show_sphinx = True
176
177 # If true, "(C) Copyright ..." is shown in the HTML footer.
178 # Default is True.
179 #html_show_copyright = True
180
181 # If true, an OpenSearch description file will be output, and all pages
182 # will contain a <link> tag referring to it. The value of this option
183 # must be the base URL from which the finished HTML is served.
184 #html_use_opensearch = ''
185
186 # This is the file name suffix for HTML files (e.g. ".xhtml").
187 #html_file_suffix = None
188
189 # Output file base name for HTML help builder.
190 htmlhelp_basename = 'googlesearchdoc'
191
192
193 # -- Options for LaTeX output ------------------------------------------
194
195 latex_elements = {
196 # The paper size ('letterpaper' or 'a4paper').
197 #'papersize': 'letterpaper',
198
199 # The font size ('10pt', '11pt' or '12pt').
200 #'pointsize': '10pt',
201
202 # Additional stuff for the LaTeX preamble.
203 #'preamble': '',
204 }
205
206 # Grouping the document tree into LaTeX files. List of tuples
207 # (source start file, target name, title, author, documentclass
208 # [howto/manual]).
209 latex_documents = [
210 ('index', 'googlesearch.tex',
211 u'google-search Documentation',
212 u'Anthony Hseb', 'manual'),
213 ]
214
215 # The name of an image file (relative to this directory) to place at
216 # the top of the title page.
217 #latex_logo = None
218
219 # For "manual" documents, if this is true, then toplevel headings
220 # are parts, not chapters.
221 #latex_use_parts = False
222
223 # If true, show page references after internal links.
224 #latex_show_pagerefs = False
225
226 # If true, show URL addresses after external links.
227 #latex_show_urls = False
228
229 # Documents to append as an appendix to all manuals.
230 #latex_appendices = []
231
232 # If false, no module index is generated.
233 #latex_domain_indices = True
234
235
236 # -- Options for manual page output ------------------------------------
237
238 # One entry per manual page. List of tuples
239 # (source start file, name, description, authors, manual section).
240 man_pages = [
241 ('index', 'googlesearch',
242 u'google-search Documentation',
243 [u'Anthony Hseb'], 1)
244 ]
245
246 # If true, show URL addresses after external links.
247 #man_show_urls = False
248
249
250 # -- Options for Texinfo output ----------------------------------------
251
252 # Grouping the document tree into Texinfo files. List of tuples
253 # (source start file, target name, title, author,
254 # dir menu entry, description, category)
255 texinfo_documents = [
256 ('index', 'googlesearch',
257 u'google-search Documentation',
258 u'Anthony Hseb',
259 'googlesearch',
260 'One line description of project.',
261 'Miscellaneous'),
262 ]
263
264 # Documents to append as an appendix to all manuals.
265 #texinfo_appendices = []
266
267 # If false, no module index is generated.
268 #texinfo_domain_indices = True
269
270 # How to display URL addresses: 'footnote', 'no', or 'inline'.
271 #texinfo_show_urls = 'footnote'
272
273 # If true, do not generate a @detailmenu in the "Top" node's menu.
274 #texinfo_no_detailmenu = False
0 .. include:: ../CONTRIBUTING.rst
0 .. include:: ../HISTORY.rst
0 Welcome to google-search's documentation!
1 ======================================
2
3 Contents:
4
5 .. toctree::
6 :maxdepth: 2
7
8 readme
9 installation
10 usage
11 contributing
12 authorshistory
13
14 Indices and tables
15 ==================
16
17 * :ref:`genindex`
18 * :ref:`modindex`
19 * :ref:`search`
0 .. highlight:: shell
1
2 ============
3 Installation
4 ============
5
6
7 Stable release
8 --------------
9
10 To install google-search, run this command in your terminal:
11
12 .. code-block:: console
13
14 $ pip install google-search
15
16 This is the preferred method to install google-search, as it will always install the most recent stable release.
17
18 If you don't have `pip`_ installed, this `Python installation guide`_ can guide
19 you through the process.
20
21 .. _pip: https://pip.pypa.io
22 .. _Python installation guide: http://docs.python-guide.org/en/latest/starting/installation/
23
24
25 From sources
26 ------------
27
28 The sources for google-search can be downloaded from the `Github repo`_.
29
30 You can either clone the public repository:
31
32 .. code-block:: console
33
34 $ git clone git://github.com/anthonyhseb/googlesearch
35
36 Or download the `tarball`_:
37
38 .. code-block:: console
39
40 $ curl -OL https://github.com/anthonyhseb/googlesearch/tarball/master
41
42 Once you have a copy of the source, you can install it with:
43
44 .. code-block:: console
45
46 $ python setup.py install
47
48
49 .. _Github repo: https://github.com/anthonyhseb/googlesearch
50 .. _tarball: https://github.com/anthonyhseb/googlesearch/tarball/master
0 @ECHO OFF
1
2 REM Command file for Sphinx documentation
3
4 if "%SPHINXBUILD%" == "" (
5 set SPHINXBUILD=sphinx-build
6 )
7 set BUILDDIR=_build
8 set ALLSPHINXOPTS=-d %BUILDDIR%/doctrees %SPHINXOPTS% .
9 set I18NSPHINXOPTS=%SPHINXOPTS% .
10 if NOT "%PAPER%" == "" (
11 set ALLSPHINXOPTS=-D latex_paper_size=%PAPER% %ALLSPHINXOPTS%
12 set I18NSPHINXOPTS=-D latex_paper_size=%PAPER% %I18NSPHINXOPTS%
13 )
14
15 if "%1" == "" goto help
16
17 if "%1" == "help" (
18 :help
19 echo.Please use `make ^<target^>` where ^<target^> is one of
20 echo. html to make standalone HTML files
21 echo. dirhtml to make HTML files named index.html in directories
22 echo. singlehtml to make a single large HTML file
23 echo. pickle to make pickle files
24 echo. json to make JSON files
25 echo. htmlhelp to make HTML files and a HTML help project
26 echo. qthelp to make HTML files and a qthelp project
27 echo. devhelp to make HTML files and a Devhelp project
28 echo. epub to make an epub
29 echo. latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter
30 echo. text to make text files
31 echo. man to make manual pages
32 echo. texinfo to make Texinfo files
33 echo. gettext to make PO message catalogs
34 echo. changes to make an overview over all changed/added/deprecated items
35 echo. xml to make Docutils-native XML files
36 echo. pseudoxml to make pseudoxml-XML files for display purposes
37 echo. linkcheck to check all external links for integrity
38 echo. doctest to run all doctests embedded in the documentation if enabled
39 goto end
40 )
41
42 if "%1" == "clean" (
43 for /d %%i in (%BUILDDIR%\*) do rmdir /q /s %%i
44 del /q /s %BUILDDIR%\*
45 goto end
46 )
47
48
49 %SPHINXBUILD% 2> nul
50 if errorlevel 9009 (
51 echo.
52 echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
53 echo.installed, then set the SPHINXBUILD environment variable to point
54 echo.to the full path of the 'sphinx-build' executable. Alternatively you
55 echo.may add the Sphinx directory to PATH.
56 echo.
57 echo.If you don't have Sphinx installed, grab it from
58 echo.http://sphinx-doc.org/
59 exit /b 1
60 )
61
62 if "%1" == "html" (
63 %SPHINXBUILD% -b html %ALLSPHINXOPTS% %BUILDDIR%/html
64 if errorlevel 1 exit /b 1
65 echo.
66 echo.Build finished. The HTML pages are in %BUILDDIR%/html.
67 goto end
68 )
69
70 if "%1" == "dirhtml" (
71 %SPHINXBUILD% -b dirhtml %ALLSPHINXOPTS% %BUILDDIR%/dirhtml
72 if errorlevel 1 exit /b 1
73 echo.
74 echo.Build finished. The HTML pages are in %BUILDDIR%/dirhtml.
75 goto end
76 )
77
78 if "%1" == "singlehtml" (
79 %SPHINXBUILD% -b singlehtml %ALLSPHINXOPTS% %BUILDDIR%/singlehtml
80 if errorlevel 1 exit /b 1
81 echo.
82 echo.Build finished. The HTML pages are in %BUILDDIR%/singlehtml.
83 goto end
84 )
85
86 if "%1" == "pickle" (
87 %SPHINXBUILD% -b pickle %ALLSPHINXOPTS% %BUILDDIR%/pickle
88 if errorlevel 1 exit /b 1
89 echo.
90 echo.Build finished; now you can process the pickle files.
91 goto end
92 )
93
94 if "%1" == "json" (
95 %SPHINXBUILD% -b json %ALLSPHINXOPTS% %BUILDDIR%/json
96 if errorlevel 1 exit /b 1
97 echo.
98 echo.Build finished; now you can process the JSON files.
99 goto end
100 )
101
102 if "%1" == "htmlhelp" (
103 %SPHINXBUILD% -b htmlhelp %ALLSPHINXOPTS% %BUILDDIR%/htmlhelp
104 if errorlevel 1 exit /b 1
105 echo.
106 echo.Build finished; now you can run HTML Help Workshop with the ^
107 .hhp project file in %BUILDDIR%/htmlhelp.
108 goto end
109 )
110
111 if "%1" == "qthelp" (
112 %SPHINXBUILD% -b qthelp %ALLSPHINXOPTS% %BUILDDIR%/qthelp
113 if errorlevel 1 exit /b 1
114 echo.
115 echo.Build finished; now you can run "qcollectiongenerator" with the ^
116 .qhcp project file in %BUILDDIR%/qthelp, like this:
117 echo.^> qcollectiongenerator %BUILDDIR%\qthelp\googlesearch.qhcp
118 echo.To view the help file:
119 echo.^> assistant -collectionFile %BUILDDIR%\qthelp\googlesearch.ghc
120 goto end
121 )
122
123 if "%1" == "devhelp" (
124 %SPHINXBUILD% -b devhelp %ALLSPHINXOPTS% %BUILDDIR%/devhelp
125 if errorlevel 1 exit /b 1
126 echo.
127 echo.Build finished.
128 goto end
129 )
130
131 if "%1" == "epub" (
132 %SPHINXBUILD% -b epub %ALLSPHINXOPTS% %BUILDDIR%/epub
133 if errorlevel 1 exit /b 1
134 echo.
135 echo.Build finished. The epub file is in %BUILDDIR%/epub.
136 goto end
137 )
138
139 if "%1" == "latex" (
140 %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex
141 if errorlevel 1 exit /b 1
142 echo.
143 echo.Build finished; the LaTeX files are in %BUILDDIR%/latex.
144 goto end
145 )
146
147 if "%1" == "latexpdf" (
148 %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex
149 cd %BUILDDIR%/latex
150 make all-pdf
151 cd %BUILDDIR%/..
152 echo.
153 echo.Build finished; the PDF files are in %BUILDDIR%/latex.
154 goto end
155 )
156
157 if "%1" == "latexpdfja" (
158 %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex
159 cd %BUILDDIR%/latex
160 make all-pdf-ja
161 cd %BUILDDIR%/..
162 echo.
163 echo.Build finished; the PDF files are in %BUILDDIR%/latex.
164 goto end
165 )
166
167 if "%1" == "text" (
168 %SPHINXBUILD% -b text %ALLSPHINXOPTS% %BUILDDIR%/text
169 if errorlevel 1 exit /b 1
170 echo.
171 echo.Build finished. The text files are in %BUILDDIR%/text.
172 goto end
173 )
174
175 if "%1" == "man" (
176 %SPHINXBUILD% -b man %ALLSPHINXOPTS% %BUILDDIR%/man
177 if errorlevel 1 exit /b 1
178 echo.
179 echo.Build finished. The manual pages are in %BUILDDIR%/man.
180 goto end
181 )
182
183 if "%1" == "texinfo" (
184 %SPHINXBUILD% -b texinfo %ALLSPHINXOPTS% %BUILDDIR%/texinfo
185 if errorlevel 1 exit /b 1
186 echo.
187 echo.Build finished. The Texinfo files are in %BUILDDIR%/texinfo.
188 goto end
189 )
190
191 if "%1" == "gettext" (
192 %SPHINXBUILD% -b gettext %I18NSPHINXOPTS% %BUILDDIR%/locale
193 if errorlevel 1 exit /b 1
194 echo.
195 echo.Build finished. The message catalogs are in %BUILDDIR%/locale.
196 goto end
197 )
198
199 if "%1" == "changes" (
200 %SPHINXBUILD% -b changes %ALLSPHINXOPTS% %BUILDDIR%/changes
201 if errorlevel 1 exit /b 1
202 echo.
203 echo.The overview file is in %BUILDDIR%/changes.
204 goto end
205 )
206
207 if "%1" == "linkcheck" (
208 %SPHINXBUILD% -b linkcheck %ALLSPHINXOPTS% %BUILDDIR%/linkcheck
209 if errorlevel 1 exit /b 1
210 echo.
211 echo.Link check complete; look for any errors in the above output ^
212 or in %BUILDDIR%/linkcheck/output.txt.
213 goto end
214 )
215
216 if "%1" == "doctest" (
217 %SPHINXBUILD% -b doctest %ALLSPHINXOPTS% %BUILDDIR%/doctest
218 if errorlevel 1 exit /b 1
219 echo.
220 echo.Testing of doctests in the sources finished, look at the ^
221 results in %BUILDDIR%/doctest/output.txt.
222 goto end
223 )
224
225 if "%1" == "xml" (
226 %SPHINXBUILD% -b xml %ALLSPHINXOPTS% %BUILDDIR%/xml
227 if errorlevel 1 exit /b 1
228 echo.
229 echo.Build finished. The XML files are in %BUILDDIR%/xml.
230 goto end
231 )
232
233 if "%1" == "pseudoxml" (
234 %SPHINXBUILD% -b pseudoxml %ALLSPHINXOPTS% %BUILDDIR%/pseudoxml
235 if errorlevel 1 exit /b 1
236 echo.
237 echo.Build finished. The pseudo-XML files are in %BUILDDIR%/pseudoxml.
238 goto end
239 )
240
241 :end
0 .. include:: ../README.rst
0 =====
1 Usage
2 =====
3
4 To use google-search in a project::
5
6 from googlesearch.googlesearch import GoogleSearch
7 response = GoogleSearch().search("something")
8 for result in response.results:
9 print("Title: " + result.title)
10 print("URL: " + result.url)
11 print("Content: " + result.getText())
12 print("Html: " + result.getMarkup())
+0
-18
google.egg-info/PKG-INFO less more
0 Metadata-Version: 1.1
1 Name: google
2 Version: 2.0.3
3 Summary: Python bindings to the Google search engine.
4 Home-page: http://breakingcode.wordpress.com/
5 Author: Mario Vilas
6 Author-email: [email protected]
7 License: UNKNOWN
8 Description: UNKNOWN
9 Platform: UNKNOWN
10 Classifier: Development Status :: 5 - Production/Stable
11 Classifier: Intended Audience :: Developers
12 Classifier: License :: OSI Approved :: BSD License
13 Classifier: Environment :: Console
14 Classifier: Programming Language :: Python
15 Classifier: Topic :: Software Development :: Libraries :: Python Modules
16 Requires: beautifulsoup4
17 Provides: googlesearch
+0
-13
google.egg-info/SOURCES.txt less more
0 MANIFEST.in
1 README.md
2 requirements.txt
3 setup.cfg
4 setup.py
5 google.egg-info/PKG-INFO
6 google.egg-info/SOURCES.txt
7 google.egg-info/dependency_links.txt
8 google.egg-info/requires.txt
9 google.egg-info/top_level.txt
10 googlesearch/__init__.py
11 googlesearch/user_agents.txt.gz
12 scripts/google
+0
-1
google.egg-info/dependency_links.txt less more
0
+0
-1
google.egg-info/requires.txt less more
0 beautifulsoup4
+0
-1
google.egg-info/top_level.txt less more
0 googlesearch
0 Metadata-Version: 2.1
1 Name: google-search
2 Version: 1.1.1
3 Summary: Library for scraping google search results
4 Home-page: https://github.com/anthonyhseb/googlesearch
5 Author: Anthony Hseb
6 Author-email: [email protected]
7 License: MIT license
8 Keywords: googlesearch
9 Classifier: Development Status :: 2 - Pre-Alpha
10 Classifier: Intended Audience :: Developers
11 Classifier: License :: OSI Approved :: MIT License
12 Classifier: Natural Language :: English
13 Classifier: Programming Language :: Python :: 2
14 Classifier: Programming Language :: Python :: 2.7
15 Classifier: Programming Language :: Python :: 3
16 Classifier: Programming Language :: Python :: 3.6
17 Classifier: Programming Language :: Python :: 3.8
18 License-File: LICENSE
19 License-File: AUTHORS.rst
20
21 =============
22 google-search
23 =============
24
25
26 .. image:: https://img.shields.io/pypi/v/google-search.svg
27 :target: https://pypi.python.org/pypi/google-search
28
29 .. image:: https://img.shields.io/travis/anthonyhseb/googlesearch.svg
30 :target: https://travis-ci.org/anthonyhseb/googlesearch
31
32 .. image:: https://readthedocs.org/projects/googlesearch/badge/?version=latest
33 :target: https://googlesearch.readthedocs.io/en/latest/?badge=latest
34 :alt: Documentation Status
35
36 .. image:: https://pyup.io/repos/github/anthonyhseb/googlesearch/shield.svg
37 :target: https://pyup.io/repos/github/anthonyhseb/googlesearch/
38 :alt: Updates
39
40
41 Library for scraping google search results.
42
43 * Usage::
44
45 from googlesearch.googlesearch import GoogleSearch
46 response = GoogleSearch().search("something")
47 for result in response.results:
48 print("Title: " + result.title)
49 print("Content: " + result.getText())
50
51
52
53 * Free software: MIT license
54
55 Features
56 --------
57
58 Run a Google search and fetch the individual results (full HTML and text contents). By default the result URLs are fetched eagerly when the search request is made with 10 parallel requests. Fetching can be deferred until ``searchResult.getText()`` or ``getMarkup()`` are called by passing ``prefetch_results = False`` to the search method.
59
60 Pass ``num_results`` to the search method to set the maximum number of results.
61
62 ``SearchReponse.total`` gives the total number of results on Google.
63
64 Credits
65 ---------
66
67 This package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template.
68
69 .. _Cookiecutter: https://github.com/audreyr/cookiecutter
70 .. _`audreyr/cookiecutter-pypackage`: https://github.com/audreyr/cookiecutter-pypackage
71
72
73
74 =======
75 History
76 =======
77
78 1.0.0 (2017-05-06)
79 ------------------
80
81 * First release on PyPI.
0 AUTHORS.rst
1 CONTRIBUTING.rst
2 HISTORY.rst
3 LICENSE
4 MANIFEST.in
5 README.rst
6 setup.cfg
7 setup.py
8 docs/Makefile
9 docs/authors.rst
10 docs/conf.py
11 docs/contributing.rst
12 docs/history.rst
13 docs/index.rst
14 docs/installation.rst
15 docs/make.bat
16 docs/readme.rst
17 docs/usage.rst
18 google_search.egg-info/PKG-INFO
19 google_search.egg-info/SOURCES.txt
20 google_search.egg-info/dependency_links.txt
21 google_search.egg-info/not-zip-safe
22 google_search.egg-info/requires.txt
23 google_search.egg-info/top_level.txt
24 googlesearch/__init__.py
25 googlesearch/browser_agents.txt
26 googlesearch/googlesearch.py
27 tests/__init__.py
28 tests/test_googlesearch.py
0 beautifulsoup4
1 lxml
2 soupsieve
0 #!/usr/bin/env python
1
2 # Python bindings to the Google search engine
3 # Copyright (c) 2009-2019, Mario Vilas
4 # All rights reserved.
5 #
6 # Redistribution and use in source and binary forms, with or without
7 # modification, are permitted provided that the following conditions are met:
8 #
9 # * Redistributions of source code must retain the above copyright notice,
10 # this list of conditions and the following disclaimer.
11 # * Redistributions in binary form must reproduce the above copyright
12 # notice,this list of conditions and the following disclaimer in the
13 # documentation and/or other materials provided with the distribution.
14 # * Neither the name of the copyright holder nor the names of its
15 # contributors may be used to endorse or promote products derived from
16 # this software without specific prior written permission.
17 #
18 # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
19 # AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
20 # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
21 # ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
22 # LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
23 # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
24 # SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
25 # INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
26 # CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
27 # ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
28 # POSSIBILITY OF SUCH DAMAGE.
29
30 import os
31 import random
32 import sys
33 import time
34
35 if sys.version_info[0] > 2:
36 from http.cookiejar import LWPCookieJar
37 from urllib.request import Request, urlopen
38 from urllib.parse import quote_plus, urlparse, parse_qs
39 else:
40 from cookielib import LWPCookieJar
41 from urllib import quote_plus
42 from urllib2 import Request, urlopen
43 from urlparse import urlparse, parse_qs
44
45 try:
46 from bs4 import BeautifulSoup
47 is_bs4 = True
48 except ImportError:
49 from BeautifulSoup import BeautifulSoup
50 is_bs4 = False
51
52 __all__ = [
53
54 # Main search function.
55 'search',
56
57 # Specialized search functions.
58 'search_images', 'search_news',
59 'search_videos', 'search_shop',
60 'search_books', 'search_apps',
61
62 # Shortcut for "get lucky" search.
63 'lucky',
64
65 # Miscellaneous utility functions.
66 'get_random_user_agent', 'get_tbs',
67 ]
68
69 # URL templates to make Google searches.
70 url_home = "https://www.google.%(tld)s/"
71 url_search = "https://www.google.%(tld)s/search?hl=%(lang)s&q=%(query)s&" \
72 "btnG=Google+Search&tbs=%(tbs)s&safe=%(safe)s&tbm=%(tpe)s&" \
73 "cr=%(country)s"
74 url_next_page = "https://www.google.%(tld)s/search?hl=%(lang)s&q=%(query)s&" \
75 "start=%(start)d&tbs=%(tbs)s&safe=%(safe)s&tbm=%(tpe)s&" \
76 "cr=%(country)s"
77 url_search_num = "https://www.google.%(tld)s/search?hl=%(lang)s&q=%(query)s&" \
78 "num=%(num)d&btnG=Google+Search&tbs=%(tbs)s&safe=%(safe)s&" \
79 "tbm=%(tpe)s&cr=%(country)s"
80 url_next_page_num = "https://www.google.%(tld)s/search?hl=%(lang)s&" \
81 "q=%(query)s&num=%(num)d&start=%(start)d&tbs=%(tbs)s&" \
82 "safe=%(safe)s&tbm=%(tpe)s&cr=%(country)s"
83 url_parameters = (
84 'hl', 'q', 'num', 'btnG', 'start', 'tbs', 'safe', 'tbm', 'cr')
85
86 # Cookie jar. Stored at the user's home folder.
87 # If the cookie jar is inaccessible, the errors are ignored.
88 home_folder = os.getenv('HOME')
89 if not home_folder:
90 home_folder = os.getenv('USERHOME')
91 if not home_folder:
92 home_folder = '.' # Use the current folder on error.
93 cookie_jar = LWPCookieJar(os.path.join(home_folder, '.google-cookie'))
94 try:
95 cookie_jar.load()
96 except Exception:
97 pass
98
99 # Default user agent, unless instructed by the user to change it.
100 USER_AGENT = 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)'
101
102 # Load the list of valid user agents from the install folder.
103 # The search order is:
104 # * user_agents.txt.gz
105 # * user_agents.txt
106 # * default user agent
107 try:
108 install_folder = os.path.abspath(os.path.split(__file__)[0])
109 try:
110 user_agents_file = os.path.join(install_folder, 'user_agents.txt.gz')
111 import gzip
112 fp = gzip.open(user_agents_file, 'rb')
113 try:
114 user_agents_list = [_.strip() for _ in fp.readlines()]
115 finally:
116 fp.close()
117 del fp
118 except Exception:
119 user_agents_file = os.path.join(install_folder, 'user_agents.txt')
120 with open(user_agents_file) as fp:
121 user_agents_list = [_.strip() for _ in fp.readlines()]
122 except Exception:
123 user_agents_list = [USER_AGENT]
124
125
126 # Get a random user agent.
127 def get_random_user_agent():
128 """
129 Get a random user agent string.
130
131 :rtype: str
132 :return: Random user agent string.
133 """
134 return random.choice(user_agents_list)
135
136
137 # Helper function to format the tbs parameter.
138 def get_tbs(from_date, to_date):
139 """
140 Helper function to format the tbs parameter.
141
142 :param datetime.date from_date: Python date object.
143 :param datetime.date to_date: Python date object.
144
145 :rtype: str
146 :return: Dates encoded in tbs format.
147 """
148 from_date = from_date.strftime('%m/%d/%Y')
149 to_date = to_date.strftime('%m/%d/%Y')
150 return 'cdr:1,cd_min:%(from_date)s,cd_max:%(to_date)s' % vars()
151
152
153 # Request the given URL and return the response page, using the cookie jar.
154 # If the cookie jar is inaccessible, the errors are ignored.
155 def get_page(url, user_agent=None):
156 """
157 Request the given URL and return the response page, using the cookie jar.
158
159 :param str url: URL to retrieve.
160 :param str user_agent: User agent for the HTTP requests.
161 Use None for the default.
162
163 :rtype: str
164 :return: Web page retrieved for the given URL.
165
166 :raises IOError: An exception is raised on error.
167 :raises urllib2.URLError: An exception is raised on error.
168 :raises urllib2.HTTPError: An exception is raised on error.
169 """
170 if user_agent is None:
171 user_agent = USER_AGENT
172 request = Request(url)
173 request.add_header('User-Agent', user_agent)
174 cookie_jar.add_cookie_header(request)
175 response = urlopen(request)
176 cookie_jar.extract_cookies(response, request)
177 html = response.read()
178 response.close()
179 try:
180 cookie_jar.save()
181 except Exception:
182 pass
183 return html
184
185
186 # Filter links found in the Google result pages HTML code.
187 # Returns None if the link doesn't yield a valid result.
188 def filter_result(link):
189 try:
190
191 # Decode hidden URLs.
192 if link.startswith('/url?'):
193 o = urlparse(link, 'http')
194 link = parse_qs(o.query)['q'][0]
195
196 # Valid results are absolute URLs not pointing to a Google domain,
197 # like images.google.com or googleusercontent.com for example.
198 # TODO this could be improved!
199 o = urlparse(link, 'http')
200 if o.netloc and 'google' not in o.netloc:
201 return link
202
203 # On error, return None.
204 except Exception:
205 pass
206
207
208 # Returns a generator that yields URLs.
209 def search(query, tld='com', lang='en', tbs='0', safe='off', num=10, start=0,
210 stop=None, domains=None, pause=2.0, tpe='', country='',
211 extra_params=None, user_agent=None):
212 """
213 Search the given query string using Google.
214
215 :param str query: Query string. Must NOT be url-encoded.
216 :param str tld: Top level domain.
217 :param str lang: Language.
218 :param str tbs: Time limits (i.e "qdr:h" => last hour,
219 "qdr:d" => last 24 hours, "qdr:m" => last month).
220 :param str safe: Safe search.
221 :param int num: Number of results per page.
222 :param int start: First result to retrieve.
223 :param int stop: Last result to retrieve.
224 Use None to keep searching forever.
225 :param list domains: A list of web domains to constrain
226 the search.
227 :param float pause: Lapse to wait between HTTP requests.
228 A lapse too long will make the search slow, but a lapse too short may
229 cause Google to block your IP. Your mileage may vary!
230 :param str tpe: Search type (images, videos, news, shopping, books, apps)
231 Use the following values {videos: 'vid', images: 'isch',
232 news: 'nws', shopping: 'shop', books: 'bks', applications: 'app'}
233 :param str country: Country or region to focus the search on. Similar to
234 changing the TLD, but does not yield exactly the same results.
235 Only Google knows why...
236 :param dict extra_params: A dictionary of extra HTTP GET
237 parameters, which must be URL encoded. For example if you don't want
238 Google to filter similar results you can set the extra_params to
239 {'filter': '0'} which will append '&filter=0' to every query.
240 :param str user_agent: User agent for the HTTP requests.
241 Use None for the default.
242
243 :rtype: generator of str
244 :return: Generator (iterator) that yields found URLs.
245 If the stop parameter is None the iterator will loop forever.
246 """
247 # Set of hashes for the results found.
248 # This is used to avoid repeated results.
249 hashes = set()
250
251 # Count the number of links yielded.
252 count = 0
253
254 # Prepare domain list if it exists.
255 if domains:
256 query = query + ' ' + ' OR '.join(
257 'site:' + domain for domain in domains)
258
259 # Prepare the search string.
260 query = quote_plus(query)
261
262 # If no extra_params is given, create an empty dictionary.
263 # We should avoid using an empty dictionary as a default value
264 # in a function parameter in Python.
265 if not extra_params:
266 extra_params = {}
267
268 # Check extra_params for overlapping.
269 for builtin_param in url_parameters:
270 if builtin_param in extra_params.keys():
271 raise ValueError(
272 'GET parameter "%s" is overlapping with \
273 the built-in GET parameter',
274 builtin_param
275 )
276
277 # Grab the cookie from the home page.
278 get_page(url_home % vars(), user_agent)
279
280 # Prepare the URL of the first request.
281 if start:
282 if num == 10:
283 url = url_next_page % vars()
284 else:
285 url = url_next_page_num % vars()
286 else:
287 if num == 10:
288 url = url_search % vars()
289 else:
290 url = url_search_num % vars()
291
292 # Loop until we reach the maximum result, if any (otherwise, loop forever).
293 while not stop or count < stop:
294
295 # Remeber last count to detect the end of results.
296 last_count = count
297
298 # Append extra GET parameters to the URL.
299 # This is done on every iteration because we're
300 # rebuilding the entire URL at the end of this loop.
301 for k, v in extra_params.items():
302 k = quote_plus(k)
303 v = quote_plus(v)
304 url = url + ('&%s=%s' % (k, v))
305
306 # Sleep between requests.
307 # Keeps Google from banning you for making too many requests.
308 time.sleep(pause)
309
310 # Request the Google Search results page.
311 html = get_page(url, user_agent)
312
313 # Parse the response and get every anchored URL.
314 if is_bs4:
315 soup = BeautifulSoup(html, 'html.parser')
316 else:
317 soup = BeautifulSoup(html)
318 try:
319 anchors = soup.find(id='search').findAll('a')
320 # Sometimes (depending on the User-agent) there is
321 # no id "search" in html response...
322 except AttributeError:
323 # Remove links of the top bar.
324 gbar = soup.find(id='gbar')
325 if gbar:
326 gbar.clear()
327 anchors = soup.findAll('a')
328
329 # Process every anchored URL.
330 for a in anchors:
331
332 # Get the URL from the anchor tag.
333 try:
334 link = a['href']
335 except KeyError:
336 continue
337
338 # Filter invalid links and links pointing to Google itself.
339 link = filter_result(link)
340 if not link:
341 continue
342
343 # Discard repeated results.
344 h = hash(link)
345 if h in hashes:
346 continue
347 hashes.add(h)
348
349 # Yield the result.
350 yield link
351
352 # Increase the results counter.
353 # If we reached the limit, stop.
354 count += 1
355 if stop and count >= stop:
356 return
357
358 # End if there are no more results.
359 # XXX TODO review this logic, not sure if this is still true!
360 if last_count == count:
361 break
362
363 # Prepare the URL for the next request.
364 start += num
365 if num == 10:
366 url = url_next_page % vars()
367 else:
368 url = url_next_page_num % vars()
369
370
371 # Shortcut to search images.
372 # Beware, this does not return the image link.
373 def search_images(*args, **kwargs):
374 """
375 Shortcut to search images.
376
377 Same arguments and return value as the main search function.
378
379 :note: Beware, this does not return the image link.
380 """
381 kwargs['tpe'] = 'isch'
382 return search(*args, **kwargs)
383
384
385 # Shortcut to search news.
386 def search_news(*args, **kwargs):
387 """
388 Shortcut to search news.
389
390 Same arguments and return value as the main search function.
391 """
392 kwargs['tpe'] = 'nws'
393 return search(*args, **kwargs)
394
395
396 # Shortcut to search videos.
397 def search_videos(*args, **kwargs):
398 """
399 Shortcut to search videos.
400
401 Same arguments and return value as the main search function.
402 """
403 kwargs['tpe'] = 'vid'
404 return search(*args, **kwargs)
405
406
407 # Shortcut to search shop.
408 def search_shop(*args, **kwargs):
409 """
410 Shortcut to search shop.
411
412 Same arguments and return value as the main search function.
413 """
414 kwargs['tpe'] = 'shop'
415 return search(*args, **kwargs)
416
417
418 # Shortcut to search books.
419 def search_books(*args, **kwargs):
420 """
421 Shortcut to search books.
422
423 Same arguments and return value as the main search function.
424 """
425 kwargs['tpe'] = 'bks'
426 return search(*args, **kwargs)
427
428
429 # Shortcut to search apps.
430 def search_apps(*args, **kwargs):
431 """
432 Shortcut to search apps.
433
434 Same arguments and return value as the main search function.
435 """
436 kwargs['tpe'] = 'app'
437 return search(*args, **kwargs)
438
439
440 # Shortcut to single-item search.
441 # Evaluates the iterator to return the single URL as a string.
442 def lucky(*args, **kwargs):
443 """
444 Shortcut to single-item search.
445
446 Same arguments as the main search function, but the return value changes.
447
448 :rtype: str
449 :return: URL found by Google.
450 """
451 return next(search(*args, **kwargs))
0 __version__ = "1.0.0"
0 Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/ 58.0.3029.81 Safari/537.36
1 Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36
2 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36
3 Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36
4 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36
5 Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0
6 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36
7 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_1) AppleWebKit/604.3.5 (KHTML, like Gecko) Version/11.0.1 Safari/604.3.5
8 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36
9 Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36
10 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36
11 Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0
12 Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0
13 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/604.3.5 (KHTML, like Gecko) Version/11.0.1 Safari/604.3.5
14 Mozilla/5.0 (X11; Linux x86_64; rv:57.0) Gecko/20100101 Firefox/57.0
15 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.89 Safari/537.36
16 Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko
17 Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36
18 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36
19 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36 Edge/15.15063
20 Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36
21 Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36
22 Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0
23 Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:57.0) Gecko/20100101 Firefox/57.0
24 Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0
25 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 Edge/16.16299
26 Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:57.0) Gecko/20100101 Firefox/57.0
27 Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:52.0) Gecko/20100101 Firefox/52.0
28 Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.89 Safari/537.36
29 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36
30 Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko
31 Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.89 Safari/537.36
32 Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0
33 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.89 Safari/537.36 OPR/49.0.2725.47
34 Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:57.0) Gecko/20100101 Firefox/57.0
35 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36
36 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/604.3.5 (KHTML, like Gecko) Version/11.0.1 Safari/604.3.5
37 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.89 Safari/537.36
38 Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:56.0) Gecko/20100101 Firefox/56.0
39 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/600.5.17 (KHTML, like Gecko) Version/8.0.5 Safari/600.5.17
40 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Safari/604.1.38
41 Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36
42 Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0
43 Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36
44 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36
45 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Safari/604.1.38
46 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0
47 Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36
48 Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:57.0) Gecko/20100101 Firefox/57.0
49 Mozilla/5.0 (Windows NT 10.0; WOW64; rv:56.0) Gecko/20100101 Firefox/56.0
50 Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36
51 Mozilla/5.0 (Windows NT 6.1; rv:57.0) Gecko/20100101 Firefox/57.0
52 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36
53 Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0
54 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.89 Safari/537.36
55 Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36
56 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.89 Safari/537.36 OPR/49.0.2725.39
57 Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:58.0) Gecko/20100101 Firefox/58.0
58 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/603.3.8 (KHTML, like Gecko) Version/10.1.2 Safari/603.3.8
59 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/603.3.8 (KHTML, like Gecko) Version/10.1.2 Safari/603.3.8
60 Mozilla/5.0 (Windows NT 6.1; rv:52.0) Gecko/20100101 Firefox/52.0
61 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.79 Safari/537.36 Edge/14.14393
62 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36 OPR/48.0.2685.52
63 Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36
64 Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/62.0.3202.75 Chrome/62.0.3202.75 Safari/537.36
65 Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36
66 Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36
67 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36
68 Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36
69 Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.62 Safari/537.36
70 Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/62.0.3202.94 Chrome/62.0.3202.94 Safari/537.36
71 Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0; Trident/5.0)
72 Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; Touch; rv:11.0) like Gecko
73 Mozilla/5.0 (X11; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0
74 Mozilla/5.0 (X11; Linux x86_64; rv:58.0) Gecko/20100101 Firefox/58.0
75 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36
76 Mozilla/5.0 (iPad; CPU OS 11_1_2 like Mac OS X) AppleWebKit/604.3.5 (KHTML, like Gecko) Version/11.0 Mobile/15B202 Safari/604.1
77 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36
78 Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko
79 Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0; Trident/5.0)
80 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36
81 Mozilla/5.0 (Windows NT 6.1; rv:56.0) Gecko/20100101 Firefox/56.0
82 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36
83 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.62 Safari/537.36
84 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/603.2.4 (KHTML, like Gecko) Version/10.1.1 Safari/603.2.4
85 Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0
86 Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:56.0) Gecko/20100101 Firefox/56.0
87 Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/62.0.3202.89 Chrome/62.0.3202.89 Safari/537.36
88 Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36
89 Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36
90 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.89 Safari/537.36
91 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36
92 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36
93 Mozilla/5.0 (Windows NT 5.1; rv:52.0) Gecko/20100101 Firefox/52.0
94 Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36,
0 # Contributors:
1 # https://github.com/anthonyhseb
2 # https://github.com/rakeshsagalagatte
3 # https://github.com/hildogjr
4
5 import sys
6 if sys.version_info[0] > 2:
7 import urllib.request as urllib
8 else:
9 import urllib2 as urllib
10 import math
11 import re
12 from bs4 import BeautifulSoup
13 from multiprocessing.pool import ThreadPool # To deal with the parallel scrape.
14 from random import choice
15 from time import sleep
16 from pkg_resources import resource_filename
17 from contextlib import closing
18
19 class GoogleSearch:
20 with open(resource_filename('googlesearch', 'browser_agents.txt'), 'r') as file_handle:
21 USER_AGENTS = file_handle.read().splitlines()
22 SEARCH_URL = "https://google.com/search"
23 RESULT_SELECTOR = "div.g"
24 RESULT_SELECTOR_PAGE1 = "div.g>div>div[id][data-ved]"
25 TOTAL_SELECTOR = "#result-stats"
26 RESULTS_PER_PAGE = 10
27 DEFAULT_HEADERS = [
28 ('User-Agent', choice(USER_AGENTS)),
29 ("Accept-Language", "en-US,en;q=0.5"),
30 ]
31
32 def search(self,
33 query,
34 num_results = 10,
35 prefetch_pages = True,
36 num_prefetch_threads = 10):
37 '''Perform the Google search.
38
39 Parameters:
40 String to search.
41 Minimum number of result to stop search.
42 Prefetch answered pages.
43 Number of threads used t prefetch the pages.
44 Time between thread executions in second to void IP block.
45 '''
46 search_results = []
47 pages = int(math.ceil(num_results / float(GoogleSearch.RESULTS_PER_PAGE)))
48 total = None
49 thread_pool = None
50 if prefetch_pages:
51 thread_pool = ThreadPool(num_prefetch_threads)
52 for i in range(pages) :
53 start = i * GoogleSearch.RESULTS_PER_PAGE
54 opener = urllib.build_opener()
55 opener.addheaders = GoogleSearch.DEFAULT_HEADERS
56 with closing(opener.open(GoogleSearch.SEARCH_URL +
57 "?hl=en&q="+ urllib.quote(query) +
58 ("" if start == 0 else
59 ("&start=" + str(start))))) as response:
60 soup = BeautifulSoup(response.read(), "lxml")
61 if total is None:
62 if sys.version_info[0] > 2:
63 totalText = soup.select(GoogleSearch.TOTAL_SELECTOR)[0].children.__next__()
64 else:
65 totalText = soup.select(GoogleSearch.TOTAL_SELECTOR)[0].children.next()
66 total = int(re.sub("[', ]", "",
67 re.search("(([0-9]+[', ])*[0-9]+)",
68 totalText).group(1)))
69 selector = GoogleSearch.RESULT_SELECTOR_PAGE1 if i == 0 else GoogleSearch.RESULT_SELECTOR
70 self.results = self.parse_results(soup.select(selector), i)
71 # if len(search_results) + len(self.results) > num_results:
72 # del self.results[num_results - len(search_results):]
73 search_results += self.results
74 if prefetch_pages:
75 thread_pool.map_async(SearchResult.get_text, self.results)
76 if prefetch_pages:
77 thread_pool.close()
78 thread_pool.join()
79 return SearchResponse(search_results, total)
80
81 def parse_results(self, results, page):
82 search_results = []
83 for result in results:
84 if page == 0:
85 result = result.parent
86 else:
87 result = result.find("div")
88 h3 = result.find("h3")
89 if h3 is None:
90 continue
91 url = h3.parent["href"]
92 title = h3.text
93 search_results.append(SearchResult(title, url))
94 return search_results
95
96 class SearchResponse:
97 def __init__(self, results, total):
98 self.results = results
99 self.total = total
100
101 class SearchResult:
102 def __init__(self, title, url):
103 self.title = title
104 self.url = url
105 self.__text = None
106 self.__markup = None
107
108 def get_text(self):
109 if self.__text is None:
110 soup = BeautifulSoup(self.get_markup(), "lxml")
111 for junk in soup(['style', 'script', 'head', 'title', 'meta']):
112 junk.extract()
113 self.__text = soup.get_text()
114 return self.__text
115
116 def get_markup(self):
117 if self.__markup is None:
118 opener = urllib.build_opener()
119 opener.addheaders = GoogleSearch.DEFAULT_HEADERS
120 response = opener.open(self.url)
121 self.__markup = response.read()
122 return self.__markup
123
124 def __str__(self):
125 return str(self.__dict__)
126 def __unicode__(self):
127 return str(self.__str__())
128 def __repr__(self):
129 return self.__str__()
130
131
132 # Main entry for test and external script use.
133 if __name__ == "__main__":
134 import sys
135 if len(sys.argv) == 1: # Only the file name.
136 query = "python"
137 else:
138 query = " ".join(sys.argv[1:])
139 search = GoogleSearch()
140 num_results = 10
141 print ("Fetching first " + str(num_results) + " results for \"" + query + "\"...")
142 response = search.search(query, num_results, prefetch_pages=True)
143 print ("TOTAL: " + str(response.total) + " RESULTS")
144 for count, result in enumerate(response.results):
145 print("RESULT #" + str (count+1) + ":")
146 print((result._SearchResult__text.strip()
147 if result._SearchResult__text is not None else "[None]") + "\n\n")
googlesearch/user_agents.txt.gz less more
Binary diff not shown
+0
-1
requirements.txt less more
0 beautifulsoup4>=4.0
+0
-137
scripts/google less more
0 #!/usr/bin/env python
1
2 # Python bindings to the Google search engine
3 # Copyright (c) 2009-2019, Mario Vilas
4 # All rights reserved.
5 #
6 # Redistribution and use in source and binary forms, with or without
7 # modification, are permitted provided that the following conditions are met:
8 #
9 # * Redistributions of source code must retain the above copyright notice,
10 # this list of conditions and the following disclaimer.
11 # * Redistributions in binary form must reproduce the above copyright
12 # notice,this list of conditions and the following disclaimer in the
13 # documentation and/or other materials provided with the distribution.
14 # * Neither the name of the copyright holder nor the names of its
15 # contributors may be used to endorse or promote products derived from
16 # this software without specific prior written permission.
17 #
18 # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
19 # AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
20 # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
21 # ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
22 # LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
23 # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
24 # SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
25 # INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
26 # CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
27 # ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
28 # POSSIBILITY OF SUCH DAMAGE.
29
30 import sys
31
32 from googlesearch import search, get_random_user_agent
33
34 # TODO port to argparse
35 from optparse import OptionParser, IndentedHelpFormatter
36
37
38 class BannerHelpFormatter(IndentedHelpFormatter):
39
40 "Just a small tweak to optparse to be able to print a banner."
41
42 def __init__(self, banner, *argv, **argd):
43 self.banner = banner
44 IndentedHelpFormatter.__init__(self, *argv, **argd)
45
46 def format_usage(self, usage):
47 msg = IndentedHelpFormatter.format_usage(self, usage)
48 return '%s\n%s' % (self.banner, msg)
49
50
51 def main():
52
53 # Parse the command line arguments.
54 formatter = BannerHelpFormatter(
55 "Python script to use the Google search engine\n"
56 "By Mario Vilas (mvilas at gmail dot com)\n"
57 "https://github.com/MarioVilas/googlesearch\n"
58 )
59 parser = OptionParser(formatter=formatter)
60 parser.set_usage("%prog [options] query")
61 parser.add_option(
62 '--tld', metavar='TLD', type='string', default='com',
63 help="top level domain to use [default: com]")
64 parser.add_option(
65 '--lang', metavar='LANGUAGE', type='string', default='en',
66 help="produce results in the given language [default: en]")
67 parser.add_option(
68 '--domains', metavar='DOMAINS', type='string', default='',
69 help="comma separated list of domains to constrain the search to")
70 parser.add_option(
71 '--tbs', metavar='TBS', type='string', default='0',
72 help="produce results from period [default: 0]")
73 parser.add_option(
74 '--safe', metavar='SAFE', type='string', default='off',
75 help="kids safe search [default: off]")
76 parser.add_option(
77 '--type', metavar='TYPE', type='string', default='search', dest='tpe',
78 help="search type (search, images, videos, news, shopping, books,"
79 " apps) [default: search]")
80 parser.add_option(
81 '--country', metavar='COUNTRY', type='string', default='',
82 help="region to restrict search on [default: not restricted]")
83 parser.add_option(
84 '--num', metavar='NUMBER', type='int', default=10,
85 help="number of results per page [default: 10]")
86 parser.add_option(
87 '--start', metavar='NUMBER', type='int', default=0,
88 help="first result to retrieve [default: 0]")
89 parser.add_option(
90 '--stop', metavar='NUMBER', type='int', default=0,
91 help="last result to retrieve [default: unlimited]")
92 parser.add_option(
93 '--pause', metavar='SECONDS', type='float', default=2.0,
94 help="pause between HTTP requests [default: 2.0]")
95 parser.add_option(
96 '--rua', metavar='USERAGENT', action='store_true', default=False,
97 help="Randomize the User-Agent [default: no]")
98 (options, args) = parser.parse_args()
99 query = ' '.join(args)
100 if not query:
101 parser.print_help()
102 sys.exit(2)
103 params = [
104 (k, v) for (k, v) in options.__dict__.items()
105 if not k.startswith('_')]
106 params = dict(params)
107
108 # Split the comma separated list of domains, if present.
109 if 'domains' in params:
110 params['domains'] = [x.strip() for x in params['domains'].split(',')]
111
112 # Use a special search type if requested.
113 if 'tpe' in params:
114 tpe = params['tpe']
115 if tpe and tpe not in (
116 'search', 'images', 'videos', 'news',
117 'shopping', 'books', 'apps'):
118 parser.error("invalid type: %r" % tpe)
119 if tpe == 'search':
120 params['tpe'] = ''
121
122 # Randomize the user agent if requested.
123 if 'rua' in params and params.pop('rua'):
124 params['user_agent'] = get_random_user_agent()
125
126 # Run the query.
127 for url in search(query, **params):
128 print(url)
129 try:
130 sys.stdout.flush()
131 except Exception:
132 pass
133
134
135 if __name__ == '__main__':
136 main()
0 [bumpversion]
1 current_version = 1.1.1
2 commit = True
3 tag = True
4
5 [bumpversion:file:setup.py]
6 search = version='{current_version}'
7 replace = version='{new_version}'
8
9 [bumpversion:file:googlesearch/__init__.py]
10 search = __version__ = '{current_version}'
11 replace = __version__ = '{new_version}'
12
013 [bdist_wheel]
114 universal = 1
215
00 #!/usr/bin/env python
1 # -*- coding: utf-8 -*-
12
2 # Copyright (c) 2009-2019, Mario Vilas
3 # All rights reserved.
4 #
5 # Redistribution and use in source and binary forms, with or without
6 # modification, are permitted provided that the following conditions are met:
7 #
8 # * Redistributions of source code must retain the above copyright notice,
9 # this list of conditions and the following disclaimer.
10 # * Redistributions in binary form must reproduce the above copyright
11 # notice,this list of conditions and the following disclaimer in the
12 # documentation and/or other materials provided with the distribution.
13 # * Neither the name of the copyright holder nor the names of its
14 # contributors may be used to endorse or promote products derived from
15 # this software without specific prior written permission.
16 #
17 # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
18 # AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
19 # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
20 # ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
21 # LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
22 # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
23 # SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
24 # INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
25 # CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
26 # ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
27 # POSSIBILITY OF SUCH DAMAGE.
3 from setuptools import setup
284
29 from os import chdir
30 from os.path import abspath, join, split
5 with open('README.rst') as readme_file:
6 readme = readme_file.read()
317
32 # Make sure we are standing in the correct directory.
33 # Old versions of distutils didn't take care of this.
34 here = split(abspath(__file__))[0]
35 chdir(here)
8 with open('HISTORY.rst') as history_file:
9 history = history_file.read()
3610
37 # Package metadata.
38 metadata = dict(
39 name='google',
40 provides=['googlesearch'],
41 requires=['beautifulsoup4'],
42 packages=['googlesearch'],
43 scripts=[join('scripts', 'google')],
44 package_data={'googlesearch': ['user_agents.txt.gz']},
11 requirements = [
12 'beautifulsoup4',
13 'lxml',
14 'soupsieve'
15 ]
16
17 test_requirements = [
18 ]
19
20 setup(
21 name='google-search',
22 version='1.1.1',
23 description="Library for scraping google search results",
24 long_description=readme + '\n\n' + history,
25 author="Anthony Hseb",
26 author_email='[email protected]',
27 url='https://github.com/anthonyhseb/googlesearch',
28 packages=[
29 'googlesearch',
30 ],
31 package_dir={'googlesearch':
32 'googlesearch'},
4533 include_package_data=True,
46 version="2.0.3",
47 description="Python bindings to the Google search engine.",
48 author="Mario Vilas",
49 author_email="[email protected]",
50 url="http://breakingcode.wordpress.com/",
34 install_requires=requirements,
35 license="MIT license",
36 zip_safe=False,
37 keywords='googlesearch',
5138 classifiers=[
52 "Development Status :: 5 - Production/Stable",
53 "Intended Audience :: Developers",
54 "License :: OSI Approved :: BSD License",
55 "Environment :: Console",
56 "Programming Language :: Python",
57 "Topic :: Software Development :: Libraries :: Python Modules",
58 ],
39 'Development Status :: 2 - Pre-Alpha',
40 'Intended Audience :: Developers',
41 'License :: OSI Approved :: MIT License',
42 'Natural Language :: English',
43 "Programming Language :: Python :: 2",
44 'Programming Language :: Python :: 2.7',
45 'Programming Language :: Python :: 3',
46 'Programming Language :: Python :: 3.6',
47 'Programming Language :: Python :: 3.8',
48 ],
49 test_suite='tests',
50 tests_require=test_requirements
5951 )
60
61 # Prefer setuptools over the old distutils.
62 # If setuptools is available, use install_requires.
63 try:
64 from setuptools import setup
65 metadata['install_requires'] = metadata['requires']
66 except ImportError:
67 from distutils.core import setup
68 """
69 # Get the long description from the readme file.
70 try:
71 metadata['long_description'] = open(join(here, 'README.md'), 'rU').read()
72 except Exception:
73 pass
74
75 # If twine is installed, set the long description content type.
76 try:
77 import twine
78 metadata['long_description_content_type'] = 'text/markdown'
79 except ImportError:
80 pass
81 """
82 # Run the setup script.
83 setup(**metadata)
(New empty file)
0 '''
1 Created on May 6, 2017
2
3 @author: anthony
4 '''
5 import unittest
6 from googlesearch.googlesearch import GoogleSearch
7
8 class TestGoolgeSearch(unittest.TestCase):
9
10 def test_search(self):
11 num_results = 15
12 min_results = 11
13 max_results = 20
14 response = GoogleSearch().search("unittest", num_results = num_results)
15 self.assertTrue(response.total > 1000, "repsonse.total is way too low")
16 self.assertTrue(len(response.results) >= min_results, "number of results is " + str(len(response.results)) + ", expected at least " + str(min_results))
17 self.assertTrue(len(response.results) <= max_results, "number of results is " + str(len(response.results)) + ", expected at most " + str(max_results))
18 for result in response.results:
19 self.assertTrue(result.url is not None, "result.url is None")
20 self.assertTrue(result.url.startswith("http"), "result.url is invalid: " + result.url)
21 for result in response.results:
22 self.assertTrue(result.get_text() is not None, "result.text is None")
23
24 if __name__ == '__main__':
25 unittest.main()