BeautifulSoup Scraping US News Today Stock <table>

Используя Python, я пытаюсь отказаться от таблицы акций под $ 10 от US Today Money Stocks до $ 10 . Затем добавьте каждый элемент в список (чтобы я мог выполнять итерацию через каждый запас). В настоящее время у меня есть этот код:

resp = requests.get('https://money.usnews.com/investing/stocks/stocks-under-10')
soup = bs.BeautifulSoup(resp.text, "lxml")
table = soup.find('table', {'class': 'table stock full-row search-content'})
tickers = []
for row in table.findAll('tr')[1:]:
    ticker = str(row.findAll('td')[0].text)
    tickers.append(ticker)

Я продолжаю получать ошибку:

Traceback (most recent call last):
  File "sandp.py", line 98, in <module>
    sandp(0)
  File "sandp.py", line 40, in sandp
    for row in table.findAll('tr')[1:]:
AttributeError: 'NoneType' object has no attribute 'findAll'

python,web-scraping,beautifulsoup,stocks,

0

Ответов: 1


2 принят

Сайт динамичен, поэтому вы можете использовать selenium:

from selenium import webdriver
import collections
from bs4 import BeautifulSoup as soup
import re
d = webdriver.Chrome('/path/to/chromedriver')
d.get('https://money.usnews.com/investing/stocks/stocks-under-10')
s = soup(d.page_source, 'lxml')
while True:
  try:
    d.find_element_by_link_text("Load More").click() #get all data
  except:
    break
company = collections.namedtuple('company', ['name', 'abbreviation', 'description', 'stats'])
headers = [['a', {'class':'search-result-link'}], ['a', {'class':'text-muted'}], ['p', {'class':'text-small show-for-medium-up ellipsis'}], ['dl', {'class':'inline-dl'}], ['span', {'class':'stock-trend'}], ['div', {'class':'flex-row'}]]
final_data = [[getattr(i.find(a, b), 'text', None) for a, b in headers] for i in soup(d.page_source, 'html.parser').find_all('div', {'class':'search-result flex-row'})]
new_data = [[i[0], i[1], re.sub('
+s{2,}', '', i[2]), [re.findall('[$w.%/]+', d) for d in i[3:]]] for i in final_data]
final_results = [i[:3]+[dict(zip(['Price', 'Daily Change', 'Percent Change'], filter(lambda x:re.findall('d', x), i[-1][0])))] for i in new_data]
new_results = [company(*i) for i in final_results]

Выход (первая компания):

company(name=u'Aileron Therapeutics Inc', abbreviation=u'ALRN', description=u'Aileron Therapeutics, Inc. is a clinical stage biopharmaceutical company, which focuses on developing and commercializing stapled peptides. Its ALRN-6924 product targets the tumor suppressor p53 for the treatment of a wide variety of cancers. It also offers the MDMX and MDM2. The company was founded by Gregory L. Verdine, Rosana Kapeller, Huw M. Nash, Joseph A. Yanchik III, and Loren David Walensky in June 2005 and is headquartered in Cambridge, MA.more
', stats={'Daily Change': u'$0.02', 'Price': u'$6.04', 'Percent Change': u'0.33%'})

Редактировать:

Все аббревиатуры:

abbrevs = [i.abbreviation for i in new_results]

Вывод:

[u'ALRN', u'HAIR', u'ONCY', u'EAST', u'CERC', u'ENPH', u'CASI', u'AMBO', u'CWBR', u'TRXC', u'NIHD', u'LGCY', u'MRNS', u'RFIL', u'AUTO', u'NEPT', u'ARQL', u'ITUS', u'SRAX', u'APTO']
питон, веб-очищающий, BeautifulSoup, акция,
Похожие вопросы