±à¼ÍƼö: |
ÕâÆªÎÄÕÂÖУ¬ÎÒÃÇÌôÑ¡ÁË24¸öÓÃÓÚÊý¾Ý¿ÆÑ§µÄPython¿â¡£ÕâЩ¿âÓÐ×Ų»Í¬µÄÊý¾Ý¿ÆÑ§¹¦ÄÜ£¬ÀýÈçÊý¾ÝÊÕ¼¯£¬Êý¾ÝÇåÀí£¬Êý¾Ý̽Ë÷£¬½¨Ä£µÈ£¬½ÓÏÂÀ´ÎÒÃÇ»á·ÖÀà½éÉÜ¡£ ±¾ÎÄÀ´×ÔÓÚ Î¢ÐŹ«ÖںŠ- ÅÍ´´AI£¨xunixs£©£¬ÓÉ»ðÁú¹ûÈí¼þAlice±à¼£¬ÍƼö¡£ |
|
½éÉÜ
ÎÒÊÇPythonÓïÑÔµÄÖÒʵ·ÛË¿£¬ËüÊÇÎÒÔÚÊý¾Ý¿ÆÑ§·½ÃæÑ§µ½µÄµÚÒ»Ãűà³ÌÓïÑÔ¡£PythonÓÐÈý¸öÌØµã£º
ËüµÄÒ×ÓÃÐÔºÍÁé»îÐÔ
È«ÐÐÒµµÄ½ÓÊܶȣºËüÊÇÒµÄÚ×îÁ÷ÐеÄÊý¾Ý¿ÆÑ§ÓïÑÔ
ÓÃÓÚÊý¾Ý¿ÆÑ§µÄÅÓ´óÊýÁ¿µÄPython¿â
ÊÂʵÉÏ£¬ÓÐÈç´Ë¶àµÄPython¿â£¬Òª¸úÉÏËüÃǵķ¢Õ¹ËÙ¶È¿ÉÄÜ»á±äµÃ·Ç³£À§ÄÑ¡£Õâ¾ÍÊÇΪʲôÎÒ¾ö¶¨Ïû³ýÕâÖÖÍ´¿à£¬²¢±à¼Õâ24¸öPython¿â¡£»»¾ä»°Ëµ£¬ÔÚÊý¾Ý¿ÆÑ§ÁìÓò£¬ÄãÕÆÎÕÕâ¸ö24¸öpython¿â¾Í¹»ÁË£¡

ÄÇÊÇ¶ÔµÄ - ÎÒ¸ù¾Ý¸÷×ÔÔÚÊý¾Ý¿ÆÑ§ÖеĽÇÉ«¶ÔÕâЩ¿â½øÐÐÁË·ÖÀà¡£ËùÒÔÎÒÌáµ½ÁËÓÃÓÚÊý¾ÝÇåÀí£¬Êý¾Ý²Ù×÷£¬¿ÉÊÓ»¯£¬¹¹½¨Ä£ÐÍÉõÖÁÄ£ÐͲ¿Êð£¨ÒÔ¼°ÆäËû£©µÄ¿â¡£ÕâÊÇÒ»¸ö·Ç³£È«ÃæµÄÁÐ±í£¬¿É°ïÖúÄú¿ªÊ¼Ê¹ÓÃPython½øÐÐÊý¾Ý¿ÆÑ§Ö®Âá£
ÓÃÓÚ²»Í¬Êý¾Ý¿ÆÑ§ÈÎÎñµÄPython¿â£º
ÓÃÓÚÊý¾ÝÊÕ¼¯µÄPython¿â£º
Beautiful Soup
Scrapy
Selenium
ÓÃÓÚÊý¾ÝÇåÀíºÍ²Ù×÷µÄPython¿â£º
Pandas
PyOD
NumPy
Spacy
ÓÃÓÚÊý¾Ý¿ÉÊÓ»¯µÄPython¿â£º
Matplotlib
Seaborn
Bokeh
ÓÃÓÚ½¨Ä£µÄPython¿â£º
Scikit-learn
TensorFlow
PyTorch
ÓÃÓÚÄ£ÐͿɽâÊÍÐÔµÄPython¿â£º
Lime
H2O
ÓÃÓÚÒôƵ´¦ÀíµÄPython¿â£º
Librosa
Madmom
pyAudioAnalysis
ÓÃÓÚͼÏñ´¦ÀíµÄPython¿â£º
OpenCV-Python
Scikit-image
Pillow
ÓÃÓÚÊý¾Ý¿âµÄPython¿â£º
Psycopg
SQLAlchemy
ÓÃÓÚ²¿ÊðµÄPython¿â£º
Flask
ÓÃÓÚÊý¾ÝÊÕ¼¯µÄPython¿â
ÄúÊÇ·ñÓöµ½¹ýÒ»ÖÖÇé¿ö£¬¼´ÄúûÓÐ×ã¹»µÄÊý¾ÝÀ´½â¾öÄúÏëÒª½â¾öµÄÎÊÌ⣿ÕâÊÇÊý¾Ý¿ÆÑ§ÖÐÒ»¸öÓÀºãµÄÎÊÌâ¡£Õâ¾ÍÊÇΪʲôѧϰÈçºÎÌáÈ¡ºÍÊÕ¼¯Êý¾Ý¶ÔÊý¾Ý¿ÆÑ§¼ÒÀ´ËµÊÇÒ»Ïî·Ç³£¹Ø¼üµÄ¼¼ÄÜ¡£Ëü¿ª±ÙÁËÒÔǰÎÞ·¨ÊµÏÖµÄ;¾¶¡£
ËùÒÔÕâÀïÓÐÈý¸öÓÐÓõÄPython¿â£¬ÓÃÓÚÌáÈ¡ºÍÊÕ¼¯Êý¾Ý¡£
/* Beautiful Soup */
ÊÕ¼¯Êý¾ÝµÄ×î¼Ñ·½·¨Ö®Ò»ÊÇ×¥È¡ÍøÕ¾£¨µ±È»ÊǵÀµÂºÍºÏ·¨µÄ£¡£©¡£ÊÖ¶¯Íê³ÉÐèÒª»¨·ÑÌ«¶àµÄÊÖ¶¯¹¤×÷ºÍʱ¼ä¡£ÃÀÀöµÄÌÀÊÇÄãµÄ¾ÈÐÇ¡£
Beautiful SoupÊÇÒ»¸öHTMLºÍXML½âÎöÆ÷£¬ËüΪ½âÎöµÄÒ³Ãæ´´½¨½âÎöÊ÷£¬ÓÃÓÚ´ÓÍøÒ³ÖÐÌáÈ¡Êý¾Ý¡£´ÓÍøÒ³ÖÐÌáÈ¡Êý¾ÝµÄ¹ý³Ì³ÆÎªÍøÂçץȡ¡£
ʹÓÃÒÔÏ´úÂë°²×°BeautifulSoup£º
pip install beautifulsoup4 |
ÕâÊÇÒ»¸öʵÏÖBeautiful SoupµÄ¼òµ¥´úÂ룬ÓÃÓÚ´ÓHTMLÖÐÌáÈ¡ËùÓÐanchor±ê¼Ç£º
#!/usr/bin/python3
# Anchor extraction from html document
from bs4 import BeautifulSoup
from urllib.request import urlopen
with urlopen('LINK') as response:
soup = BeautifulSoup (response, 'html.parser')
for anchor in soup.find_all('a'):
print(anchor.get('href', '/')) |
ÎÒ½¨Òéͨ¹ýÒÔÏÂÎÄÕÂÀ´Ñ§Ï°ÈçºÎÔÚPythonÖÐʹÓÃBeautifulSoup£º
ʹÓÃBeautifulSoupÔÚPythonÖнøÐÐWeb
ScrapingµÄ³õѧÕßÖ¸ÄÏ
/* Scrapy */
ScrapyÊÇÁíÒ»¸öÓÃÓÚWebץȡµÄ³¬¼¶ÓÐÓõÄPython¿â¡£ËüÊÇÒ»¸ö¿ªÔ´ºÍÐ×÷¿ò¼Ü£¬ÓÃÓÚ´ÓÍøÕ¾ÖÐÌáÈ¡ÄúÐèÒªµÄÊý¾Ý¡£ËüʹÓÃÆðÀ´¿ìËÙ¶ø¼òµ¥¡£
ÕâÊǰ²×°ScrapyµÄ´úÂ룺
pip install scrapy

ËüÊÇ´ó¹æÄ£ÍøÂçץȡµÄ¿ò¼Ü¡£ËüΪÄúÌṩÁËÓÐЧÌáÈ¡ÍøÕ¾Êý¾Ý£¬¸ù¾ÝÐèÒª´¦ÀíÊý¾Ý²¢½«Æä´æ´¢ÔÚÊ×Ñ¡½á¹¹ºÍ¸ñʽÖÐËùÐèµÄËùÓй¤¾ß¡£
ÕâÊÇÒ»¸öʵÏÖScrapyµÄ¼òµ¥´úÂ룺
import scrapy
class Spider(scrapy.Spider):
name = 'NAME'
start_urls = ['LINK']
def parse(self, response):
for title in response.css('.post-header>h2'):
yield {'title': title.css('a ::text').get()}
for next_page in response.css('a.next-posts-link'):
yield response.follow(next_page, self.parse)
|
ÕâÊÇѧϰScrapy²¢ÔÚPythonÖÐʵÏÖËüµÄÍêÃÀ½Ì³Ì£º
ʹÓÃScrapyÔÚPythonÖнøÐÐWeb
Scraping£¨Óжà¸öʾÀý£©
/* Selenium */
SeleniumÊÇÒ»ÖÖÓÃÓÚ×Ô¶¯»¯ä¯ÀÀÆ÷µÄÁ÷Ðй¤¾ß¡£ËüÖ÷ÒªÓÃÓÚÐÐÒµ²âÊÔ£¬µ«¶ÔÓÚÍøÂçץȡҲ·Ç³£·½±ã¡£Êµ¼ÊÉÏ£¬SeleniumÔÚITÁìÓò±äµÃ·Ç³£ÊÜ»¶Ó£¬ËùÒÔÎÒÏàÐźܶàÈËÖÁÉÙ»áÌý˵¹ýËü¡£

ÎÒÃÇ¿ÉÒÔÇáËɵرàдPython½Å±¾ÒÔʹÓÃSelenium×Ô¶¯»¯Webä¯ÀÀÆ÷¡£ËüΪÎÒÃÇÓÐЧµØÌáÈ¡Êý¾Ý²¢ÒÔÎÒÃÇϲ»¶µÄ¸ñʽ´æ´¢Êý¾Ý£¬ÒÔ¹©½«À´Ê¹Óá£
ÎÒ×î½üдÁËһƪ¹ØÓÚʹÓÃPythonºÍSeleniumץȡYouTubeÊÓÆµÊý¾ÝµÄÎÄÕ£º
Êý¾Ý¿ÆÑ§ÏîÄ¿£ºÊ¹ÓÃPythonºÍSelenium¶ÔYouTubeÊý¾Ý½øÐйβÁÒÔ¶ÔÊÓÆµ½øÐзÖÀà
ÓÃÓÚÊý¾ÝÇåÀíºÍ²Ù×÷µÄPython¿â
ºÃ°É - ËùÒÔÄãÒѾÊÕ¼¯ÁËÄãµÄÊý¾Ý²¢×¼±¸ºÃDZÈë¡£ÏÖÔÚÊÇʱºòÇåÀíÎÒÃÇ¿ÉÄÜÃæÁÙµÄÈκλìÂÒÊý¾Ý²¢Ñ§Ï°ÈçºÎ²Ù×÷Ëü£¬ÒÔ±ãÎÒÃǵÄÊý¾Ý¿ÉÒÔÓÃÓÚ½¨Ä£¡£
ÕâÀïÓÐËĸöPython¿â¿ÉÒÔ°ïÖúÄúʵÏÖÕâһĿ±ê¡£Çë¼Çס£¬ÎÒÃǽ«´¦ÀíÏÖʵÊÀ½çÖеĽṹ»¯£¨Êý×Ö£©ºÍÎı¾Êý¾Ý£¨·Ç½á¹¹»¯£©
- Õâ¸ö¿âÁÐ±íº¸ÇÁËËùÓÐÕâЩ¡£
/* Pandas */
ÔÚÊý¾Ý´¦ÀíºÍ·ÖÎö·½Ã棬ûÓÐʲôÄܱÈpandas¸üʤһ³ï¡£ËüÊÇÏÖ½×¶Î×îÁ÷ÐеÄPython¿â¡£PandasÊÇÓÃPythonÓïÑÔ±àдµÄ£¬ÌرðÊÊÓÃÓÚ²Ù×÷ºÍ·ÖÎöÈÎÎñ¡£
PandasÐèÒªÔ¤ÏȰ²×°Python»òAnaconda£¬ÕâÀïÊÇÐèÒªµÄ´úÂ룺

PandasÌṩµÄ¹¦ÄÜÈçÏ£º
Êý¾Ý¼¯¼ÓÈëºÍºÏ²¢
Êý¾Ý½á¹¹ÁÐɾ³ýºÍ²åÈë
Êý¾Ý¹ýÂË
ÖØËÜÊý¾Ý¼¯
DataFrame¶ÔÏó²Ù×ÝÊý¾ÝµÈµÈ£¡
ÕâÊÇһƪÎÄÕºÍÒ»¸öºÜ°ôµÄ±¸Íüµ¥£¬ÈÃÄãµÄpandas¼¼ÄÜ´ïµ½×î¼Ñ״̬£º
12ÓÃÓÚÊý¾Ý²Ù×÷µÄPythonÖÐÓÐÓõÄÐÜè¼¼Êõ
CheatSheet£ºÊ¹ÓÃPythonÖеÄPandas
½øÐÐÊý¾Ý̽Ë÷
/* PyOD */
ÔÚ¼ì²âÒ쳣ֵʱ¿à¿àÕõÔú£¿Äã²»ÊÇÒ»¸öÈË¡£ÕâÊÇÓб§¸º£¨ÉõÖÁÒѽ¨Á¢£©Êý¾Ý¿ÆÑ§¼ÒµÄ³£¼ûÎÊÌâ¡£ÄãÈçºÎ¶¨ÒåÒì³£Öµ£¿
±ðµ£ÐÄ£¬PyOD¿â¿ÉÒÔ°ïµ½Äú¡£
PyODÊÇÒ»¸öÈ«ÃæÇÒ¿ÉÀ©Õ¹µÄPython¹¤¾ß°ü£¬ÓÃÓÚ¼ì²âÍâΧ¶ÔÏó¡£Òì³£¼ì²â»ù±¾ÉÏÊÇʶ±ðÓë´ó¶àÊýÊý¾ÝÏÔ×Ų»Í¬µÄÏ¡ÓÐÏîÄ¿»ò¹Û²ì¡£
Äú¿ÉÒÔʹÓÃÒÔÏ´úÂëÏÂÔØpyOD£º
ÏëÁ˽âPyODÈçºÎ¹¤×÷ÒÔ¼°ÈçºÎ×Ô¼ºÊµÏÖ£¿ÄÇô£¬ÏÂÃæµÄÖ¸ÄϽ«»Ø´ðÄãËùÓеÄPyODÎÊÌ⣺
ʹÓÃPyOD
¿âÔÚPythonÖÐѧϰÒì³£¼ì²âµÄÒ»¸öºÜ°ôµÄ½Ì³Ì
/* NumPy */
ÏñPandasÒ»Ñù£¬NumPyÊÇÒ»¸ö·Ç³£ÊÜ»¶ÓµÄPython¿â¡£NumPyÒýÈëÁËÖ§³Ö´óÐͶàάÊý×éºÍ¾ØÕóµÄº¯Êý¡£Ëü»¹ÒýÈëÁ˸߼¶Êýѧº¯ÊýÀ´´¦ÀíÕâЩÊý×éºÍ¾ØÕó¡£
NumPyÊÇÒ»¸ö¿ªÔ´¿â£¬Óжà¸ö¹±Ï×Õß¡£ËüÔ¤ÏȰ²×°ÁËAnacondaºÍPython£¬ÕâÀïÊǰ²×°ËüµÄ´úÂ룺

# ´´½¨Êý×é
import numpy as np
x = np.array([1, 2, 3])
print(x)
y = np.arange(10)
print(y)
# output - [1 2 3]
# [0 1 2 3 4 5 6 7 8 9]
# »ù±¾²Ù×÷
a = np.array([1, 2, 3, 6])
b = np.linspace(0, 2, 4)
c = a - b
print(c)
print(a**2)
#output - [1. 1.33333333 1.66666667 4. ]
# [ 1 4 9 36] |
»¹Óиü¶à£¡
/* SpaCy */
µ½Ä¿Ç°ÎªÖ¹£¬ÎÒÃÇÒѾÌÖÂÛÁËÈçºÎÇåÀíºÍ²Ù×÷ÊýÖµÊý¾Ý¡£µ«ÊÇ£¬Èç¹ûÄãÕýÔÚ´¦ÀíÎı¾Êý¾ÝÄØ£¿
spaCyÊÇÒ»¸ö³¬¼¶ÓÐÓÃÇÒÁé»îµÄ×ÔÈ»ÓïÑÔ´¦Àí£¨NLP£©¿âºÍ¿ò¼Ü£¬ÓÃÓÚÇåÀíÎı¾ÎĵµÒÔ½øÐÐÄ£ÐÍ´´½¨¡£ÓëÓÃÓÚÀàËÆÈÎÎñµÄÆäËû¿âÏà±È£¬SpaCy¸ü¿ì¡£
ÔÚLinuxÖа²×°SpacyµÄ´úÂ룺
pip install
-U spacy
python -m spacy download en |
ÒªÔÚÆäËû²Ù×÷ϵͳÉϰ²×°Ëü£¬Çë²Î¿¼´ËÁ´½Ó(https://spacy.io/usage/)¡£

µ±È»£¬ÎÒÃÇΪÄúѧϰspaCyÌṩÁ˱£ÕÏ£º
×ÔÈ»ÓïÑÔ´¦Àí±äµÃÇáËÉ - ʹÓÃSpaCy£¨ÔÚPythonÖУ©
ÓÃÓÚÊý¾Ý¿ÉÊÓ»¯µÄPython¿â
ÏÂÒ»¸öÊÇʲô£¿ÎÒÔÚÕû¸öÊý¾Ý¿ÆÑ§×îϲ»¶µÄ·½Ãæ - Êý¾Ý¿ÉÊÓ»¯£¡Êý¾Ý¿ÉÊÓ»¯ºó£¬ÎÒÃǵļÙÉ轫µÃµ½Ö±¹ÛµÄÑéÖ¤£¡
ÕâÀïÓÐÈý¸öÓÃÓÚÊý¾Ý¿ÉÊÓ»¯µÄºÜ°ôµÄPython¿â¡£
/* Matplotlib */
MatplotlibÊÇPythonÖÐ×îÁ÷ÐеÄÊý¾Ý¿ÉÊÓ»¯¿â¡£ËüÔÊÐíÎÒÃÇÉú³ÉºÍ¹¹½¨¸÷ÖÖͼ±í¡£Ëü¿ÉÒÔÓëSeabornÒ»ÆðʹÓá£
Äú¿ÉÒÔͨ¹ýÒÔÏ´úÂë°²×°matplotlib£º
pip install matplotlib

ÒÔÏÂÊÇÎÒÃÇ¿ÉÒÔʹÓÃmatplotlib¹¹½¨µÄ²»Í¬ÀàÐ͵Äͼ±íµÄ¼¸¸öʾÀý£º
# Ö±·½Í¼
%matplotlib inline
import matplotlib.pyplot as plt
from numpy.random import normal
x = normal(size=100)
plt.hist(x, bins=20)
plt.show() |

# 3Dͼ
from matplotlib import cm
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure()
ax = fig.gca(projection='3d')
X = np.arange(-10, 10, 0.1)
Y = np.arange(-10, 10, 0.1)
X, Y = np.meshgrid(X, Y)
R = np.sqrt(X**2 + Y**2)
Z = np.sin(R)
surf = ax.plot_surface (X, Y, Z, rstride=1, cstride=1,
cmap=cm.coolwarm)
plt.show()
|

¼ÈÈ»ÎÒÃÇÒѾ½éÉÜÁËPandas£¬NumPyºÍÏÖÔÚµÄmatplotlib£¬Çë²é¿´ÏÂÃæµÄ½Ì³Ì£¬½«ÕâÈý¸öPython¿âÍø¸ñ»¯£º
ʹÓÃNumPy£¬MatplotlibºÍPandasÔÚPythonÖнøÐÐÊý¾Ý̽Ë÷µÄÖÕ¼«Ö¸ÄÏ
/* Seaborn */
SeabornÊÇÁíÒ»¸ö»ùÓÚmatplotlibµÄ»æÍ¼¿â¡£ËüÊÇÒ»¸öpython¿â£¬Ìṩ¸ß¼¶½çÃæÀ´»æÖÆÓÐÎüÒýÁ¦µÄͼÐΡ£matplotlib¿ÉÒÔ×öʲô£¬SeabornÖ»ÊÇÒÔ¸ü¾ßÊÓ¾õÎüÒýÁ¦µÄ·½Ê½×öµ½ÕâÒ»µã¡£
SeabornµÄһЩ¹¦ÄÜÊÇ£º
ÃæÏòÊý¾Ý¼¯µÄAPI£¬ÓÃÓÚ¼ì²é¶à¸ö±äÁ¿Ö®¼äµÄ¹ØÏµ
·½±ãµØ²é¿´¸´ÔÓÊý¾Ý¼¯µÄÕûÌå½á¹¹
ÓÃÓÚÑ¡ÔñÏÔʾÊý¾ÝÖÐͼ°¸µÄµ÷É«°åµÄ¹¤¾ß
ÄúÖ»ÐèʹÓÃÒ»ÐдúÂë¼´¿É°²×°Seaborn£º

ÈÃÎÒÃÇͨ¹ýһЩºÜ¿áµÄͼ±íÀ´¿´¿´seabornÄÜ×öʲô£º
import seaborn
as sns
sns.set()
tips = sns.load_dataset("tips")
sns.relplot(x="total_bill", y="tip",
col="time",
hue="smoker", style="smoker",
size="size",
data=tips); |

ÕâÊÇÁíÒ»¸öÀý×Ó£º
/* Bokeh */
BokehÊÇÒ»¸ö½»»¥Ê½¿ÉÊÓ»¯¿â£¬ÃæÏòÏÖ´úWebä¯ÀÀÆ÷½øÐÐÑÝʾ¡£ËüΪ´óÁ¿Êý¾Ý¼¯ÌṩÁ˶àÖÖͼÐεÄÓÅÑŹ¹Ôì¡£
Bokeh¿ÉÓÃÓÚ´´½¨½»»¥Ê½Í¼±í£¬ÒDZí°åºÍÊý¾ÝÓ¦ÓóÌÐò¡£°²×°´úÂ룺

ÇëËæÒâÔĶÁÒÔÏÂÎÄÕ£¬Á˽âÓйØBokehµÄ¸ü¶àÐÅÏ¢²¢²é¿´ÆäÖеIJÙ×÷£º
ʹÓÃBokeh½øÐн»»¥Ê½Êý¾Ý¿ÉÊÓ»¯£¨ÔÚPythonÖУ©
|