绑定完请刷新页面
取消
刷新

分享好友

×
取消 复制
Java webMagic 如何爬取知乎回答?
2019-10-12 15:44:38

用webmagic抓取知乎某个问题下的所有回答时候,每次只能获取前两条回答。

查了各种博客,试了各种方法,总是只返回2条回答,或者直接401。

o.a.h.impl.execchain.MainClientExec - Connection can be kept alive indefinitely
o.a.http.impl.auth.HttpAuthenticator - Authentication required
o.a.http.impl.auth.HttpAuthenticator - www.zhihu.com:443 requested authentication
o.a.http.impl.auth.HttpAuthenticator - Response contains no authentication challenges
o.a.h.c.p.ResponseProcessCookies - Cookie accepted [aliyungf_tc="AQAAAD1PxXQABgUA7CesO3+7/0/iFhJt", version:0, domain:www.zhihu.com, path:/, expiry:null]
o.a.h.i.c.PoolingHttpClientConnectionManager - Connection [id: 0][route: {s}->https://www.zhihu.com:443] can be kept alive indefinitely
o.a.h.i.c.PoolingHttpClientConnectionManager - Connection released: [id: 0][route: {s}->https://www.zhihu.com:443][total kept alive: 1; route allocated: 1 of 100; total allocated: 1 of 1]
u.c.webmagic.utils.CharsetUtils - Auto get charset: null
u.c.w.d.HttpClientDownloader - Charset autodetect failed, use UTF-8 as charset. Please specify charset in Site.setCharset()
u.c.w.d.HttpClientDownloader - downloading page success https://www.zhihu.com/api/v4/questions/29688243/answers?sort_by=default&include=data%5B%2A%5D.is_normal%2Cadmin_closed_comment%2Creward_info%2Cis_collapsed%2Cannotation_action%2Cannotation_detail%2Ccollapse_reason%2Cis_sticky%2Ccollapsed_by%2Csuggest_edit%2Ccomment_count%2Ccan_comment%2Ccontent%2Ceditable_content%2Cvoteup_count%2Creshipment_settings%2Ccomment_permission%2Ccreated_time%2Cupdated_time%2Creview_info%2Cquestion%2Cexcerpt%2Crelationship.is_authorized%2Cis_author%2Cvoting%2Cis_thanked%2Cis_nothelp%2Cupvoted_followees%3Bdata%5B%2A%5D.mark_infos%5B%2A%5D.url%3Bdata%5B%2A%5D.author.follower_count%2Cbadge%5B%3F%28type%3Dbest_answerer%29%5D.topics&limit=3&offset=3
09:04:14.908 [pool-1-thread-1] INFO  us.codecraft.webmagic.Spider - page status code error, page https://www.zhihu.com/api/v4/questions/29688243/answers?sort_by=default&include=data%5B%2A%5D.is_normal%2Cadmin_closed_comment%2Creward_info%2Cis_collapsed%2Cannotation_action%2Cannotation_detail%2Ccollapse_reason%2Cis_sticky%2Ccollapsed_by%2Csuggest_edit%2Ccomment_count%2Ccan_comment%2Ccontent%2Ceditable_content%2Cvoteup_count%2Creshipment_settings%2Ccomment_permission%2Ccreated_time%2Cupdated_time%2Creview_info%2Cquestion%2Cexcerpt%2Crelationship.is_authorized%2Cis_author%2Cvoting%2Cis_thanked%2Cis_nothelp%2Cupvoted_followees%3Bdata%5B%2A%5D.mark_infos%5B%2A%5D.url%3Bdata%5B%2A%5D.author.follower_count%2Cbadge%5B%3F%28type%3Dbest_answerer%29%5D.topics&limit=3&offset=3 , code: 401


求各路大神指点迷津

分享好友

分享这个小栈给你的朋友们,一起进步吧。

IT知识联盟
创建时间:2019-07-05 15:30:45
分享收集到的大小知识点
展开
订阅须知

• 所有用户可根据关注领域订阅专区或所有专区

• 付费订阅:虚拟交易,一经交易不退款;若特殊情况,可3日内客服咨询

• 专区发布评论属默认订阅所评论专区(除付费小栈外)

栈主、嘉宾

查看更多
  • 王超
    栈主

小栈成员

查看更多
  • ?
  • youou
  • gamebus
  • chinacc
戳我,来吐槽~