龙空技术网

js逆向爬取百度翻译

陌上断肠 244

前言:

现在咱们对“百度翻译js”大概比较关切,同学们都想要剖析一些“百度翻译js”的相关知识。那么小编也在网上收集了一些关于“百度翻译js””的相关文章,希望姐妹们能喜欢,朋友们快快来学习一下吧!

js逆向爬取百度翻译简介

创建时间:2021-09-20 15:30 星期一

本文通过构造Python爬虫直接爬取百度翻译,其中涉及到一些js逆向,通过逆向分析并构造出相关参数,发送相关参数获取翻译结果。

打开百度翻译,尝试输入一些单词(随你啦),打开开发者工具查看Network

发现表单数据

from: ento: zhquery: testtranstype: translangsimple_means_flag: 3sign: 431039.159886token: 8a137e841303b997efa7ed8d0881c83adomain: common
from和to明显是从英文到中文query是查询的单词transtype是翻译类型sign和token估计是加密参数
✒二

再换单词进行查询

from: ento: zhquery: resulttranstype: translangsimple_means_flag: 3sign: 586451.791010token: 8a137e841303b997efa7ed8d0881c83adomain: common

发现不同的参数只有sign,同时在Elements中发现token

在全部文件中查找sign

里面有很多只是含有sign,找到真正的sign,发现如下

sign: L(e)

在sign处加上断点并调试,点击下一步进行调试

发现跳转到的代码处r就是要查询的单词

复制function e(r)的代码到本地(我保存为sign.js)

function e(r) {var o = r.match(/[\uD800-\uDBFF][\uDC00-\uDFFF]/g);if (null === o) {var t = r.length;t > 30 && (r = "" + r.substr(0, 10) + r.substr(Math.floor(t / 2) - 5, 10) + r.substr(-10, 10))} else {for (var e = r.split(/[\uD800-\uDBFF][\uDC00-\uDFFF]/), C = 0, h = e.length, f = []; h > C; C++)"" !== e[C] && f.push.apply(f, a(e[C].split(""))),C !== h - 1 && f.push(o[C]);var g = f.length;g > 30 && (r = f.slice(0, 10).join("") + f.slice(Math.floor(g / 2) - 5, Math.floor(g / 2) + 5).join("") + f.slice(-10).join(""))}var u = void 0, l = "" + String.fromCharCode(103) + String.fromCharCode(116) + String.fromCharCode(107);u = null !== i ? i : (i = window[l] || "") || "";for (var d = u.split("."), m = Number(d[0]) || 0, s = Number(d[1]) || 0, S = [], c = 0, v = 0; v < r.length; v++) {var A = r.charCodeAt(v);128 > A ? S[c++] = A : (2048 > A ? S[c++] = A >> 6 | 192 : (55296 === (64512 & A) && v + 1 < r.length && 56320 === (64512 & r.charCodeAt(v + 1)) ? (A = 65536 + ((1023 & A) << 10) + (1023 & r.charCodeAt(++v)),S[c++] = A >> 18 | 240,S[c++] = A >> 12 & 63 | 128) : S[c++] = A >> 12 | 224,S[c++] = A >> 6 & 63 | 128),S[c++] = 63 & A | 128)}for (var p = m, F = "" + String.fromCharCode(43) + String.fromCharCode(45) + String.fromCharCode(97) + ("" + String.fromCharCode(94) + String.fromCharCode(43) + String.fromCharCode(54)), D = "" + String.fromCharCode(43) + String.fromCharCode(45) + String.fromCharCode(51) + ("" + String.fromCharCode(94) + String.fromCharCode(43) + String.fromCharCode(98)) + ("" + String.fromCharCode(43) + String.fromCharCode(45) + String.fromCharCode(102)), b = 0; b < S.length; b++)p += S[b],p = n(p, F);return p = n(p, D),p ^= s,0 > p && (p = (2147483647 & p) + 2147483648),p %= 1e6,p.toString() + "." + (p ^ m)}console.log(e('test')) //加上这句显示结果

使用node调试

node sign.js

报错i is not defined

回到开发者工具查看,一步步调试发现i="320305.131321201"

直接在代码中定义i="320305.131321201"

var i="320305.131321201"function e(r) {var o = r.match(/[\uD800-\uDBFF][\uDC00-\uDFFF]/g);if (null === o) {var t = r.length;t > 30 && (r = "" + r.substr(0, 10) + r.substr(Math.floor(t / 2) - 5, 10) + r.substr(-10, 10))} else {for (var e = r.split(/[\uD800-\uDBFF][\uDC00-\uDFFF]/), C = 0, h = e.length, f = []; h > C; C++)"" !== e[C] && f.push.apply(f, a(e[C].split(""))),C !== h - 1 && f.push(o[C]);var g = f.length;g > 30 && (r = f.slice(0, 10).join("") + f.slice(Math.floor(g / 2) - 5, Math.floor(g / 2) + 5).join("") + f.slice(-10).join(""))}var u = void 0, l = "" + String.fromCharCode(103) + String.fromCharCode(116) + String.fromCharCode(107);u = null !== i ? i : (i = window[l] || "") || "";for (var d = u.split("."), m = Number(d[0]) || 0, s = Number(d[1]) || 0, S = [], c = 0, v = 0; v < r.length; v++) {var A = r.charCodeAt(v);128 > A ? S[c++] = A : (2048 > A ? S[c++] = A >> 6 | 192 : (55296 === (64512 & A) && v + 1 < r.length && 56320 === (64512 & r.charCodeAt(v + 1)) ? (A = 65536 + ((1023 & A) << 10) + (1023 & r.charCodeAt(++v)),S[c++] = A >> 18 | 240,S[c++] = A >> 12 & 63 | 128) : S[c++] = A >> 12 | 224,S[c++] = A >> 6 & 63 | 128),S[c++] = 63 & A | 128)}for (var p = m, F = "" + String.fromCharCode(43) + String.fromCharCode(45) + String.fromCharCode(97) + ("" + String.fromCharCode(94) + String.fromCharCode(43) + String.fromCharCode(54)), D = "" + String.fromCharCode(43) + String.fromCharCode(45) + String.fromCharCode(51) + ("" + String.fromCharCode(94) + String.fromCharCode(43) + String.fromCharCode(98)) + ("" + String.fromCharCode(43) + String.fromCharCode(45) + String.fromCharCode(102)), b = 0; b < S.length; b++)p += S[b],p = n(p, F);return p = n(p, D),p ^= s,0 > p && (p = (2147483647 & p) + 2147483648),p %= 1e6,p.toString() + "." + (p ^ m)}console.log(e('test')) //加上这句显示结果

继续使用node调试,又报错n is not defined

再回到开发者工具,定位到n的位置,复制代码到sign.js

function n(r, o) {for (var t = 0; t < o.length - 2; t += 3) {var a = o.charAt(t + 2);a = a >= "a" ? a.charCodeAt(0) - 87 : Number(a),a = "+" === o.charAt(t + 1) ? r >>> a : r << a,r = "+" === o.charAt(t) ? r + a & 4294967295 : r ^ a}return r}

成功输出sign,对比发现确实是与之前获取的单词test的sign相同

编写Python代码

注意:下列代码需要调用node运行javascript,最好又node环境

 import requestsimport jsonimport osfrom retry import retryimport timeclass BaiduFanyi:    def __init__(self) -> None:        self.url=''        self.judge_type_url=';        self.result=''        self.headers={            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36',            'referer': ';,            'origin': ';,            'Cookie': 你的cookie        }        self.data={}    //调用sign.js获取sign    def get_sign(self,word):        sign=os.popen(f'node sign.js {word}').read()        return sign    def get_response(self,url,data):        response=requests.post(url,data=data,headers=self.headers).content.decode('utf8')        return json.loads(response)    def run(self):        # 获取语言类型        print('欢迎使用━(*`∀´*)ノ亻!')        while True:            word=input('请输入要查询的单词或汉语(按q退出):')            if word=='q':                break            else:                os.system('cls')                judge_lan=self.get_response(self.judge_type_url,{'query': word})                lan_type=judge_lan['lan']                if lan_type=='zh':                    tran_type='en'                else:                    tran_type='zh'                self.url=f'{lan_type}&to={tran_type}'                self.data={                    'from': lan_type,                    'to': tran_type,                    'query': word,                    'transtype': 'translang',                    'simple_means_flag': 3,                    'sign': self.get_sign(word).replace('\n',''),                    'token': '8a137e841303b997efa7ed8d0881c83a',                    'domain': 'common'                }                tran_data=self.get_response(self.url,self.data)                try:                    self.result=" ".join(tran_data['dict_result']['simple_means']['word_means'])                    for i in range(22):                        time.sleep(0.01)                        print('\r'+'翻译中'+''*i,end='')                    print()                    print('')                    print('翻译结果:')                    print(self.result)                    print('⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡⚡')                except Exception as ret:                    print('查询无果!!!')test=BaiduFanyi()test.run()

运行结果

标签: #百度翻译js #baidujs #javascript百度