![]() |
|
Ways to rip internet-data into a spreadsheet? - Printable Version +- kanji koohii FORUM (http://forum.koohii.com) +-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html) +--- Forum: Off topic (http://forum.koohii.com/forum-13.html) +--- Thread: Ways to rip internet-data into a spreadsheet? (/thread-9254.html) |
Ways to rip internet-data into a spreadsheet? - Mesqueeb - 2012-03-29 Hello! I am working on a project and I need some information from a certain Wiki website. What I need is one section of several (over 500) wiki pages, automatically copied to a spreadsheet. By hand it takes me about 20 seconds per page x 500 = 10 000 seconds = 160 minutes = 3 hours (loosely counted) Is there any program which can do this? Or any other tricks? Thanks! Ways to rip internet-data into a spreadsheet? - Blahah - 2012-04-08 I would do this with Python and the BeautifulSoup package. It makes it quite easy to parse HTML content. Ways to rip internet-data into a spreadsheet? - Mesqueeb - 2012-05-22 Blahah Wrote:I would do this with Python and the BeautifulSoup package. It makes it quite easy to parse HTML content.I think I managed to get it installed on my python 2.7 and when I go to terminal and follow this site: http://www.crummy.com/software/BeautifulSoup/bs3/download/2.x/documentation.html upon entering "python" and then: Code: >>> from BeautifulSoup import BeautifulSoupI have python 3 and 2.7 both installed (don't know why) maybe that is jamming it? I installed it like this: Code: Mesqueebiator:~ Mesqueeb$ cd /Users/Mesqueeb/Downloads/beautifulsoup4-4.0.5Ways to rip internet-data into a spreadsheet? - netsplitter - 2012-05-22 Is that all the output you get? I'm not too familiar with Macs, but don't you need to do that as a privileged user (with sudo)? Mesqueeb Wrote:I have python 3 and 2.7 both installed (don't know why) maybe that is jamming it?It's possible. It looks like the "python" command is using python 2.7 (since that's where it installed to), so it should be working. Check anyway: When you first enter "python", the very first line at the top will tell you which version it's running. Code: Python 2.7.3 (default, Apr 24 2012, 00:00:54)Ways to rip internet-data into a spreadsheet? - Mesqueeb - 2012-05-22 netsplitter Wrote:Is that all the output you get? I'm not too familiar with Macs, but don't you need to do that as a privileged user (with sudo)?It gave the same result netsplitter Wrote:When you first enter "python", the very first line at the top will tell you which version it's running.It is python 2.7.3 But I'm completely new at python, and actually don't know how to use it. xD I would like to copy paste information of several pages on the web into one spreadsheet or text file. Can you help me? ^^ Ways to rip internet-data into a spreadsheet? - vix86 - 2012-05-22 Its Code: from bs4 import BeautifulSoupWays to rip internet-data into a spreadsheet? - Blahah - 2012-05-22 Personally I use ActiveState python, and I have the 2.7 distribution installed. Then, to install beautifulsoup, you type: Code: pypm install beautifulsoupCode: python |