Python 2.7 Tutorial Pt 14

Python 2.7 Tutorial Pt 14

Derek Banas

13 лет назад

15,486 Просмотров

Ссылки и html тэги не поддерживаются


Комментарии:

@derekbanas
@derekbanas - 19.05.2011 16:13

@ma1achite I use eclipse classic. It's free and works with most every language

Ответить
@derekbanas
@derekbanas - 17.11.2011 17:56

@0Allhell Perform a view source in the browser to find out which tags you need to target. You can scrape anything that shows on the screen

Ответить
@entrevu
@entrevu - 31.12.2011 01:42

@ma1achite he's using Eclipse google it eclipse IDE

Ответить
@derekbanas
@derekbanas - 31.12.2011 02:11

@entrevu To scrap anything you just need the basic concepts I covered here with a better understanding of regular expressions. I did a tutorial in PHP that covers advanced website scraping called Web Design and Programming Pt 24. The Regular Expression explanation is identical to regex in python. I hope that helps

Ответить
@AlucardHelIsing
@AlucardHelIsing - 17.08.2012 21:53

my only question is how to make eclipse recognize the beautifulsoup download (I used 'python setup.py install' in terminal so were does these files have to go? Like where do I have to put the beautifulsoup.py or other files that came with the install. As you would expect In eclipse I am getting an error Unresolved import: BeautifulSoup

Ответить
@derekbanas
@derekbanas - 17.08.2012 22:31

Are you on a mac or pc

Ответить
@AlucardHelIsing
@AlucardHelIsing - 21.08.2012 19:47

Mac

Ответить
@AlucardHelIsing
@AlucardHelIsing - 21.08.2012 21:49

figured it out now im just getting errors with re.findall giving an TypeError: Expected string or buffer

Ответить
@paulasf2820
@paulasf2820 - 01.11.2012 02:18

Hi Derek. I need your help Do you have an email..I wll write a lot ..hope you answer

Ответить
@derekbanas
@derekbanas - 01.11.2012 02:53

Send me an email and I'll see if I can help [email protected]

Ответить
@harendraSinghIIITDMJ
@harendraSinghIIITDMJ - 21.12.2012 17:20

Since my network is behind a proxy, so when i open a webpage it asks me for username and password, is there any way that i can store username password in the program it self so that it doesn't asks me..... I searched and used urllib2 -> proxy handlers but got error

Ответить
@derekbanas
@derekbanas - 21.12.2012 20:43

Sorry, but I'd have to know more about how that information is checked.

Ответить
@TheMariouka
@TheMariouka - 13.03.2013 19:22

Hello! I am wondering whether you have or know of a tutorial to scrape from pages that are auto-generated with Javascript.

Ответить
@pavanjared
@pavanjared - 19.03.2013 05:40

What'd you do to fix this error importing BS?

Ответить
@theLach1234
@theLach1234 - 13.05.2013 00:57

I use your exact code but I only get the links and the titles. The code fails to output the snippet of the article. Any help? Has the feed for Huffington Post changed?

Ответить
@derekbanas
@derekbanas - 13.05.2013 04:07

They may have changed the tags a bit. Take a look if the tag changed around the snippet maybe

Ответить
@herp_derpingson
@herp_derpingson - 20.05.2013 08:49

from bs4 import beautifulSoup

Ответить
@sainaths224
@sainaths224 - 21.08.2013 04:34

Hai Derek, i have a question how to pass the credentials to scrap website.

Ответить
@emgoldexgreeceemgoldex2907
@emgoldexgreeceemgoldex2907 - 05.10.2013 00:56

hello again , its been a while... i was wondering which is the best method to use for web scrapping.. curl ? beautiful soap ? get_html? for example i can block the curl to my site through the confing.ini ... so i wanna start scrapping but i dont know which is the right or best method to use ...

Ответить
@derekbanas
@derekbanas - 05.10.2013 02:02

I actually use PHP most of the time, but with Python Beautiful Soup has improved lately and is quite good.

Ответить