Ou Have Triggered an Abuse Detection Mechanism. Please Wait a Few Minutes Before You Try Again.
Using GitHub's API to search for code references beyond multiple organizations
Background
As part of a modernization projection nosotros're trying to divide upwards our ~900 table MySQL database into much fewer DynamoDB tables. In order to evaluate this, we need to clarify the affect to our code. Easy enough, right? Well... we have two GitHub organizations with a combined 800 repos to wait through.
To help with this, I wrote a script to practise code searches with GitHub'southward v3 API.
Pre-reqs
You'll need a personal access token with Read access defined in your surround variables under GITHUB_CREDENTIALS_PSW.
I take Node version v12.xiii.1 installed.
Example code is here:
There are only 3 dependencies installed into the project... Node types, typescript and axios.
A search.json file also needs to be created. The JSON object includes a list of code strings to search for and the GitHub organizations to search through.
In the example I just did a basic search for password across a couple larger organizations... don't read into that... I just knew information technology'd be a common thing mentioned in at to the lowest degree one place.
In my actual use instance the listing of code strings I used were the table names. For 32 tables x 2 organizations this will be 64 API calls (which is over the authenticated user'due south charge per unit limit of 30 / infinitesimal... more on that afterwards.
The Script
The script is fairly straight-forwards. Start I do the imports and initialize the reporting object.
const fs = crave ( " fs " ); const axios = require ( " axios " ); const searchData = require ( " ./search.json " ); // personal access token stored in env const githubReadApiKey = process . env . GITHUB_CREDENTIALS_PSW ; const findings : any = { repos : {}, lawmaking : {}, };
So I have some helper functions... this only makes setTimeout a Hope so I tin can use it asynchronously after.
async role sleep ( ms : number ) { return new Promise (( resolve ) => setTimeout ( resolve , ms )); }
This getRateLimit part isn't specifically used but is useful for testing. Authenticated API calls are immune to make 30 searches per minute... BUT information technology turns out GitHub also does abuse detection.
async function getRateLimit () { return axios . become ( " https://api.github.com/rate_limit " , { headers : { Authority : `token ${ githubReadApiKey } ` , }, }); }
The searchCode function is the chief API call that does the searching. I had to build in some multiple-try / wait code due to GitHub responding with Y'all accept triggered an abuse detection machinery. Please wait a few minutes earlier you try again. on occasion (even when nether the API rate limit for searching). Fortunately their docs include a fashion around this: https://developer.github.com/v3/guides/best-practices-for-integrators/#dealing-with-abuse-rate-limits
The response includes a retry-later header... which the script detects and waits for that time (typically a minute) + 1 second.
async function searchCode ( codeStr : string , org ?: string ): Hope < SearchResults | naught > { const orgStr = org ? `+org: ${ org } ` : "" ; const attempts = 2 ; for ( let attempt = 0 ; attempt < attempts ; effort ++ ) { try { const res = await axios . get ( `https://api.github.com/search/lawmaking?q= ${ encodeURIComponent ( codeStr )}${ orgStr } &per_page=100` , { validateStatus : function () { return true ; }, headers : { Authorization : `token ${ githubReadApiKey } ` , }, } ) . catch (( east : Error ) => { panel . fault ( e ); }); if ( res . status > 200 ) { console . log ( res . information . message ); const retryAfter = parseInt ( res . headers [ " retry-later on " ]); console . log ( `Sleeping for ${ retryAfter + 1 } seconds before trying once more...` ); await sleep (( retryAfter + 1 ) * chiliad ); } else { render res . data ; } } take hold of ( due east ) { console . error ( e ); } } // shouldn't get hither... return Promise . resolve ( null ); }
The results are so divide up into what I think are useful metrics...
async function processResults ( results : any , codeStr : string , org ?: cord ) { console . log ( ` ${ codeStr } : ${ results . items . length } - ${ results . total_count } ` ); const items = results . items ; findings . code [ codeStr ]. count = findings . lawmaking [ codeStr ]. count + results . total_count ; items . forEach (( item : any ) => { if ( findings . code [ codeStr ]. repos . indexOf ( item . repository . full_name ) === - one ) { findings . lawmaking [ codeStr ]. repos . push ( item . repository . full_name ); findings . code [ codeStr ]. repoCount = findings . lawmaking [ codeStr ]. repos . length ; } if ( Object . keys ( findings . repos ). indexOf ( detail . repository . full_name ) === - one ) { findings . repos [ particular . repository . full_name ] = { paths : [ { path : item . path , score : item . score , url : particular . html_url , }, ], code : {}, codeCount : 1 , }; findings . repos [ item . repository . full_name ]. code [ codeStr ] = 1 ; } else { findings . repos [ item . repository . full_name ]. paths . push ({ path : particular . path , score : detail . score , url : item . html_url , }); if ( Object . keys ( findings . repos [ item . repository . full_name ]. code ). indexOf ( codeStr ) === - 1 ) { findings . repos [ item . repository . full_name ]. lawmaking [ codeStr ] = one ; findings . repos [ item . repository . full_name ]. codeCount = Object . keys ( findings . repos [ item . repository . full_name ]. code ). length ; } else { findings . repos [ item . repository . full_name ]. code [ codeStr ] = findings . repos [ item . repository . full_name ]. lawmaking [ codeStr ] + 1 ; } } findings . repos [ item . repository . full_name ]. pathCount = findings . repos [ particular . repository . full_name ]. paths . length ; }); }
Finally... I use an async part to run all of these. I flatten the searches into one list, do the api calls in series with some post-processing and write it to an output.json file.
async function main () { console . log ( " Starting Search... " ); const flattenedSearches : cord [][] = []; searchData . codeStrings . forEach (( codeStr : string ) => { searchData . organizations . forEach (( org : string ) => flattenedSearches . push ([ codeStr , org ]) ); }); for ( let searchInd = 0 ; searchInd < flattenedSearches . length ; searchInd ++ ) { const search = flattenedSearches [ searchInd ]; const searchResults = look searchCode ( search [ 0 ], search [ i ]); findings . code [ search [ 0 ]] = { count : 0 , repos : [], repoCount : 0 , }; processResults ( searchResults , search [ 0 ], search [ 1 ]); } findings . priority = { repos : Object . keys ( findings . repos ). sort ( ( a , b ) => findings . repos [ b ]. pathCount - findings . repos [ a ]. pathCount ), code : Object . keys ( findings . code ). sort ( ( a , b ) => findings . code [ b ]. count - findings . code [ a ]. count ), }; permit data = JSON . stringify ( findings , null , 2 ); fs . writeFileSync ( " output.json " , data ); console . log ( " Search Consummate! " ); } main (). take hold of (( e ) => console . mistake ( e ));
The Output
The output breaks things downward into the repositories that were plant, the code and a prioritization of what to look at (based on occurence.
-
repos... the repos constitute-
repos.<repo>.code... what lawmaking was in them -
repos.<repo>.codeCountandrepos.<repo>.pathCountsome basic counts for readability
-
-
code... the original search terms-
code.<code>.repos... the repos found for that code -
code.<code>.count... a count of "mentions" -
code.<code>.repoCount... the number of repos that code was found in
-
-
priority... prioritized lists of what to look at by count (start at the top of the listing)
I by and large detect this to exist enough data to do some further postal service-processing on to generate graphs such as this (from my actual data):
Unfortunately, this only searches for table names... not actual USAGE of them... so there's all the same a lot of data to go through.
If you have any useful tools for doing this type of refactoring analysis, let me know in the comments.
Source: https://dev.to/martzcodes/using-github-s-api-to-search-for-code-references-across-multiple-organizations-337l
0 Response to "Ou Have Triggered an Abuse Detection Mechanism. Please Wait a Few Minutes Before You Try Again."
Publicar un comentario