|
|
Grigori SIDOROV
PhD,
Professor and researcher, Regular
member of Phone: 52-55-57296000
ext. 56518, 56544 e-mail: |
text processing techniques and systems, automatic dictionary processing,
automatic morphological analysis of different languages, automatic syntactic
analysis, anaphora resolution, word sense disambiguation, corpus linguistics,
parallel texts, linguistic software development.
LICENSE:
1. You can use all
these programs freely for academic purposes. No warranty.
2. You should inform
us about the usage of the programs, and
3. You should cite the
corresponding papers in your publications obtained with the help of these
programs.
Downloading means that you
accept the license. Thank you.
System for translingual plagiarism detection and
exploration (based on alignment and similarity of paragraph level), for
English and Spanish languages.
In collaboration with Paolo
Rosso.
English-Spanish
dictionary of weighted morphological forms. Forms are weighted according
to the distributions of corresponding grammar classes in corpora.
For
example:
'cause porque 1.0000000
'til hasta 1.0000000
a un 0.4603677
a una 0.3662918
a unas 0.0734382
a uno 0.0031157
a unos 0.0967866
abaci ábaco 0.0561639
abaci ábacos 0.9438361
abacus ábaco 0.9890721
abacus ábacos 0.0109279
abacuses ábaco 0.0561639
abacuses ábacos 0.9438361
abandon abandonábamos 0.0024804
abandon abandonáis 0.0005694
abandon abandonáramos 0.0004860
abandon abandonáremos 0.0007113
abandon abandonásemos 0.0004860...
...abandon abandonaba 0.0779384
abandon abandonabais 0.0000805
abandon abandonaban 0.0226584...
In Unicode.
Paper for citing for English-Spanish dictionary
of weighted morphological forms:
Grigori Sidorov, Alberto Barrón-Cedeño, Paolo
Rosso. English-Spanish Large Statistical Dictionary of Inflectional Forms. In:
Proc of LREC 2010, 6 p.
Interface for
the system for fast search of Maya glyphs based on their visual structural
description: Compressed as EXE file or Compressed
as ZIP file.
Beta-version. The system uses
the dictionary of J. Montgomery.
EXE: Download the Glyphs.exe
file, execute it, the files will be copied to the folder you choose. Then
execute the file SETUP.EXE.
ZIP: Download the Glyphs.zip
file, unzip files to the folder you choose . Then execute the file SETUP.EXE.
NEW Version 04/09/2009. If you downloaded the previous version, please,
reinstall it.
Paper for citing for glyph search system:
Obdulia Pichardo Lagunas, Grigori Sidorov. Diccionario de los glifos maya con descripción visual estructural. In: Proc. of International Conference EURALEX-2008, Barcelona, Spain, July 2008, pp 747-751.
System
for automatic morphological analysis of Spanish NEW: A complete wordlist (beta-version) generated with this system is
available.
System
for automatic morphological analysis of Russian
These are EXE files for
Windows; DLLs are available on request.
These are the programs that
perform lemmatization and provide grammar information of each word form of
Spanish or Russian correspondingly.
See detailed description on
the corresponding pages – follow the links.
Paper for citing for morphological analysis
systems:
A. Gelbukh, G. Sidorov. Approach to construction of automatic
morphological analysis systems for inflective languages with little effort.
In: Computational Linguistics and Intelligent Text Processing (CICLing-2003),
Lecture Notes in Computer Science, N 2588, Springer-Verlag, 2003, pp. 215–220.
Download
concordances for Russian (EXE for Windows). This is a program that allows for
construction of concordances for Russian language. Its interesting feature is
that it can construct concordances for a set of grammar categories, e.g., all
nouns in dative, singular.
Paper for citing:
G. O. Sidorov. Lemmatization in automatized
system for compilation of personal style dictionaries of literature writers. –
Chapter in: “Word of Dostoyevsky”,
Download
parser with Spanish grammar (EXE and DLL for Windows). This is a chart parser
that uses a CF grammar with elements of unification. Experimental CF grammar
for Spanish is provided along with tools for its modifications.
Paper for citing:
A. Gelbukh, G. Sidorov, S. Galicia Haro,
Scanned selected pages from
the research journal “Polibits”, issue 37, 2008.
Scanned selected pages from
the book “Artificial
Intelligence for Humans: Service Robots and Social Modeling” (Grigori
Sidorov, Ed.), 2008
Scanned selected pages from
the special issue of the journal Research in Computing Science, vol. 40 “Advance in
Artificial Intelligence: Algorithms and Applications” (Grigori Sidorov, Ed.),
2008
More
than 140 scientific publications, 1 patent.
More
than 150 references to my works (without self-citing).
−
Who’s Who in the World.
−
Who’s Who in Science
and Engineering.
−
Editor-in-Chief of the
research journal “Polibits”.
·
Anatoly Baranov,
·
·
Alexander Gelbukh, Grigori Sidorov, and Liliana
Chanona-Hernández. Compilation of a
Spanish representative corpus. Proc. CICLing-2002, Conference on
Intelligent Text Processing and Computational Linguistics,
·
·
·
·
Gaspár
Ramírez, James L. Fidelholtz, Héctor Jiménez, Grigori Sidorov. Elaboración de
un diccionario de verbos del español a partir de una
lexicografía sistemática. In: “Avances en
·
You can find more information about
the papers, about our laboratory and about the annual International Conference
on computational linguistics CICLing (