wordpress 多站点开启mod_rewrite wordpress
wordpress 多站点开启,mod_rewrite wordpress,wordpress站长工作,大连网页制作美工0 基本逻辑
1 创建项目 scrapy startproject 项目名字
2 cd 到spiders文件夹下
3 创建爬虫文件scrapy genspider -t crawl 爬虫文件名字 爬取的域名1 settings.py文件中设置日志文件
# 一般不采取这种方式
# LOG_LEVEL WARNING
# 推荐使用日志文件的方式
LOG_FILE log.log2 …0 基本逻辑1创建项目 scrapy startproject 项目名字2cd 到spiders文件夹下3创建爬虫文件 scrapy genspider-t crawl 爬虫文件名字 爬取的域名1 settings.py文件中设置日志文件# 一般不采取这种方式# LOG_LEVEL WARNING# 推荐使用日志文件的方式LOG_FILElog.log2 使用scrapy爬取读书网的中书的名字和图片地址2.1 新建项目scrapy startproject 项目名字2.2 新建爬虫名字scrapy genspidef 爬虫名字 域名# 域名如www.baidu.com2.3 在爬虫文件中写爬取逻辑importscrapyfromscrapy.linkextractorsimportLinkExtractorfromscrapy.spidersimportCrawlSpider,Rulefromread_book.itemsimportReadBookItemclassReadbookSpider(CrawlSpider):nameread_bookallowed_domains[www.dushu.com]start_urls[https://www.dushu.com/book/1188_1.html]rules(Rule(LinkExtractor(allowr/book/1188_\d\.html),callbackparse_item,followTrue),)defparse_item(self,response):img_listresponse.xpath(//div[classbookslist]//img)forimginimg_list:nameimg.xpath(./data-original).extract_first()srcimg.xpath(./alt).extract_first()bookReadBook101Item(namename,srcsrc)yieldbook2.4 items.py文件中importscrapyclassReadBookItem(scrapy.Item):namescrapy.Field()srcscrapy.Field()2.5 pipelines.py文件中fromitemadapterimportItemAdapterfromscrapy.utils.projectimportget_project_settings# 加载settings文件importpymysqlclassReadBook101Pipeline:defopen_spider(self,spider):self.fpopen(book.json,w,encodingutf-8)defprocess_item(self,item,spider):self.fp.write(str(item))returnitemdefclose_spider(self,spider):self.fp.close()classMysqlPipeline:defopen_spider(self,spider):settingsget_project_settings()self.hostsettings[DB_HOST]self.usersettings[DB_USER]self.passwordsettings[DB_PASSWORD]self.namesettings[DB_NAME]self.portsettings[DB_PORT]self.charsetsettings[DB_CHARSET]self.connect()defconnect(self,):self.connpymysql.connect(userself.user,passwordself.password,hostself.host,databaseself.name,portself.port,charsetself.charset,)self.cursorself.conn.cursor()defprocess_item(self,item,spider):sqlinsert into book(name,src) values({},{}).format(item[name],item[src])self.cursor.execute(sql)self.conn.commit()returnitemdefclose_spider(self,spider):self.cursor.close()self.conn.close()2.6 settings文件中开启管道、配置数据库DB_HOST127.0.0.1DB_PORT3306DB_USERrootDB_PASSWORDrootDB_NAMEspider01# utf-8不允许使用 - 否则会报错NoneType……DB_CHARSETutf8# Configure item pipelines# See https://docs.scrapy.org/en/latest/topics/item-pipeline.htmlITEM_PIPELINES{read_book_101.pipelines.ReadBook101Pipeline:300,read_book_101.pipelines.MysqlPipeline:301}