User:Flyax/New pages by a user
Jump to navigation
Jump to search
The following two scripts give a list of all pages that have been created by a certain user. The first one takes a list of all their contributions and produces a list of all entries this user has modified. The second script gets data about the first revision of these entries, finds out the creator and adds the headword to the final list, if the creator is the user we are interested in.
How to run
[edit]- We need to be patient!
- We need a computer running Linux
- The first script will create a temporary sub-directory and the second one the final list, named "pagesby$.txt", where $ is a username.
- If the user has moved any pages, the redirects that have been created because of the moving will be in the final list.
- For example:
./titlesbyuser.sh Flyax
./newbyuser.sh Flyax
The final list will be in "pagesbyFlyax.txt" in the "temp_npbu" sub-directory.
titlesbyuser.sh
[edit]#!/bin/bash
usage() {
echo "Usage: $0 username"
echo "This script gets all titles that have been modified "
echo "by a specific user on en.wiktionary.org."
echo
echo "For example:"
echo "$0 Flubot"
echo
echo "The list of titles will be created in the temporary directory"
echo "temp_npby/titles.txt"
exit 1
}
if [ -z "$1" ]; then
usage
fi
user="$1"
r=0
tmp="./temp_npbu"
mkdir -p $tmp
while [ 1 ]; do
changes="$tmp/temp$user.xml"
changes1="$tmp/titles$user.xml"
titles1="$tmp/titles$user.txt"
if [ $r == 0 ]; then
curl --retry 10 -f "http://en.wiktionary.org/w/api.php?action=query&list=usercontribs&format=xml&uclimit=500&ucuser=$user" | sed -e 's/>/>\n/g' > $changes
else
curl --retry 10 -f "http://en.wiktionary.org/w/api.php?action=query&list=usercontribs&format=xml&uclimit=500&ucstart=$startdate&ucuser=$user" | sed -e 's/>/>\n/g' > $changes
fi
if [ $? -ne 0 ]; then
echo "Error $? from curl, unable to get user's contributions, bailing"
exit 1
fi
# getting titles
grep title $changes > $changes1
cat $changes1|
while read tt; do
echo $tt | awk -F 'title="' '{ print $2 }' | awk -F '"' '{ print $1 }' >> $titles1
done
# getting timestamp
# <usercontribs ucstart=
startdate=`grep "<usercontribs ucstart=" $changes | awk -F 'ucstart="' '{ print $2 }' | awk -F '"' '{ print $1 }'`
# if there is no timestamp then ... end
if [[ -z "$startdate" ]]; then
break
fi
let r=$r+1
sleep 5
done
newbyuser.sh
[edit]#!/bin/bash
usage() {
echo "Usage: $0 username"
echo "This script gets all titles that have been created "
echo "by a specific user on en.wiktionary.org."
echo
echo "For example:"
echo "$0 Flubot"
echo
echo "The list of titles will be created in the temporary directory"
echo "temp_npby/pagesbyFlubot.txt"
exit 1
}
if [ -z "$1" ]; then
usage
fi
user="$1"
user1=`echo $user | awk '{print toupper($0)}'`
tmp="./temp_npbu"
titles1="$tmp/titles$user.txt"
titles2="$tmp/titles$user.list"
cat $titles1 | sort | uniq > $titles2
cat $titles2 |
while read title; do
title0=`echo $title | sed -e "s/ /_/g"`
echo
info=`curl --retry 10 -f "http://en.wiktionary.org/w/api.php?action=query&prop=revisions&format=xml&titles=$title0&rvlimit=1&rvprop=user&rvdir=newer"`
if [ $? -ne 0 ]; then
echo "Error $? from curl, bailing"
exit 1
fi
uname=`echo $info | awk -F 'user="' '{ print $2 }' | awk -F '"' '{ print $1 }'`
uname1=`echo $uname | awk '{print toupper($0)}'`
echo
echo "$title by $uname"
echo
if [ "$uname1" == "$user1" ]; then
echo "$title" >> $tmp/pagesby$user.txt
fi
done