Hi
I have downloaded and unzipped the xml dump of Wikipedia (40+GB). I want touse Python and the SAX module (running under Windows 7) to carry out off-line phrase-searches of Wikipedia and to return a count of the number of hits for each search. Typical phrase-searches might be "of the dog" and "dog's".
I have some limited prior programming experience (from many years ago) and I am currently learning Python from a course of YouTube tutorials. Before Iget much further, I wanted to ask:
Is what I am trying to do actually feasible?
Are there any example programs or code snippets that would help me?
Any advice or guidance would be gratefully received.
Best regards,
Kevin Glover
I have downloaded and unzipped the xml dump of Wikipedia (40+GB). I want touse Python and the SAX module (running under Windows 7) to carry out off-line phrase-searches of Wikipedia and to return a count of the number of hits for each search. Typical phrase-searches might be "of the dog" and "dog's".
I have some limited prior programming experience (from many years ago) and I am currently learning Python from a course of YouTube tutorials. Before Iget much further, I wanted to ask:
Is what I am trying to do actually feasible?
Are there any example programs or code snippets that would help me?
Any advice or guidance would be gratefully received.
Best regards,
Kevin Glover
Aucun commentaire:
Enregistrer un commentaire