Modify

Opened 15 years ago

Last modified 11 months ago

#5938 new defect

Queries containing '.' return 0 results from RepoSearch

Reported by: bugs@… Owned by: Alex Smith
Priority: high Component: RepoSearchPlugin
Severity: normal Keywords:
Cc: Trac Release: 0.11

Description

A search query that contains '.' will not return results from the repository. For example, a file that has been indexed contains the string 'a.smith', as does a ticket entry. Searching for 'a.smith' will return only a hit from the ticket, not the repo. Searching for 'smith' returns hits from both the repo and the tickets. My repo consists entirely of router configurations so being able to submit queries with .'s in them [eg IP addresses] is a rather important bit of functionality.

Attachments (1)

indexer.py.patch (357 bytes) - added by Fernan Aguero 14 years ago.
suggested change in regexp

Download all attachments as: .zip

Change History (13)

comment:1 Changed 15 years ago by Alec Thomas

Tokenisation is governed by tracreposearch.indexer.Indexer._strip, you'll have to modify the source to include "." as a valid word token.

This would probably be useful to expose as a config option.

comment:2 Changed 15 years ago by Ryan J Ollos

Owner: changed from Alec Thomas to Ryan J Ollos

Reassigning ticket after changing maintainer of plugin.

comment:3 Changed 14 years ago by Fernan Aguero

Priority: normalhigh

This IS important.

My repository is full of SQL-like code that contains table names and column names, which are separated by dots, e.g. 'table.column'.

Also, a dot is an important operator in many programming languages. Searching for tokens separated with dots (e.g. object.action()) is a common task that IMO should be supported in searches.

In both these examples it's not possible to replace the search terms:

  • searching for 'table' or 'object' alone is not useful
  • searching for 'column' or 'action' alone is also not useful

there are zillions of lines in my code referring to these items alone, I only want those lines containing both tokens, and preferrably I want these tokens joined together with a dot, so that I'm sure of getting functionally relevant results.

BTW: thanks for the plugin!

comment:4 Changed 14 years ago by Ryan J Ollos

My opinion is that this plugin is very broken, which is why I don't expect to get at this issue for some time. If you look at the other open tickets, there are some major show-stoppers. However, I'd be happy to apply a patch if you generate one.

comment:5 Changed 14 years ago by Fernan Aguero

Replying to rjollos:

My opinion is that this plugin is very broken, which is why I don't expect to get at this issue for some time. If you look at the other open tickets, there are some major show-stoppers. However, I'd be happy to apply a patch if you generate one.

Hmm, I can see what you mean.

I edited 'tracreposearch.indexer.Indexer._strip' as suggested and changed the regular expression from '\w+' to '\S+', and reindexed. Now, searches kind of work.

i) Searching for 'table.column' is now OK (sometimes).

ii) But searching for Perl modules (which use '::' as a separator) don't (no matches found), e.g. GD::Graph, Data::Dumper, etc. I can't see why ... \S+ is supposed to match any non whitespace character. And that includes '::'. In fact, the regexp works fine in other contexts (perl, vim).

Anyway, as you say, the plugin seems to be broken anyway because intermittenly, running a search would produce an error, for no apparent reason:

  • Running a search for 'Data::Dumper' produces no results, as mentioned
  • However, searching for 'Dumper' alone (without the quotes) produces the following error (note the added semicolon after the word):
Oops…
Trac detected an internal error:
KeyError: 'dumper;'
This is probably a local installation issue.

Python Traceback

Most recent call last:
File "/usr/local/lib/python2.6/site-packages/Trac-0.11.7-py2.6.egg/trac/web/main.py", line 450, in _dispatch_request
File "/usr/local/lib/python2.6/site-packages/Trac-0.11.7-py2.6.egg/trac/web/main.py", line 206, in dispatch
File "/usr/local/lib/python2.6/site-packages/Trac-0.11.7-py2.6.egg/trac/search/web_ui.py", line 107, in process_request
File "build/bdist.freebsd-6.4-RELEASE-p10-i386/egg/tracreposearch/search.py", line 112, in get_search_results
File "build/bdist.freebsd-6.4-RELEASE-p10-i386/egg/tracreposearch/search.py", line 82, in <lambda>
File "/usr/local/lib/python2.6/site-packages/tracreposearch-0.2-py2.6.egg/tracreposearch/indexer.py", line 97, in wrap
File "/usr/local/lib/python2.6/site-packages/tracreposearch-0.2-py2.6.egg/tracreposearch/indexer.py", line 280, in find_words
File "/usr/local/lib/python2.6/site-packages/tracreposearch-0.2-py2.6.egg/tracreposearch/indexer.py", line 42, in __getitem__
  • Running a search for 'Genes' (there's a perl module in my code named 'weight::Genes') produced the following error:
Oops…
Trac detected an internal error:
KeyError: "$gdi_genes_rs=$c->model('weight"
This is probably a local installation issue.

Python Traceback

Most recent call last:
File "/usr/local/lib/python2.6/site-packages/Trac-0.11.7-py2.6.egg/trac/web/main.py", line 450, in _dispatch_request
File "/usr/local/lib/python2.6/site-packages/Trac-0.11.7-py2.6.egg/trac/web/main.py", line 206, in dispatch
File "/usr/local/lib/python2.6/site-packages/Trac-0.11.7-py2.6.egg/trac/search/web_ui.py", line 107, in process_request
File "build/bdist.freebsd-6.4-RELEASE-p10-i386/egg/tracreposearch/search.py", line 112, in get_search_results
File "build/bdist.freebsd-6.4-RELEASE-p10-i386/egg/tracreposearch/search.py", line 82, in <lambda>
File "/usr/local/lib/python2.6/site-packages/tracreposearch-0.2-py2.6.egg/tracreposearch/indexer.py", line 97, in wrap
File "/usr/local/lib/python2.6/site-packages/tracreposearch-0.2-py2.6.egg/tracreposearch/indexer.py", line 280, in find_words
File "/usr/local/lib/python2.6/site-packages/tracreposearch-0.2-py2.6.egg/tracreposearch/indexer.py", line 42, in __getitem__

Maybe the KeyError has some clue about what's going on?

Regarding your suggestion of me providing a patch, I for one would suggest to change the original regexp to '\S+'. That would be my patch.

Regarding the other problems I'm not proficient in Python so I can't offer much help, but I can certainly help debug reposearch ...

Cheers,

-- fernan

Changed 14 years ago by Fernan Aguero

Attachment: indexer.py.patch added

suggested change in regexp

comment:6 in reply to:  5 Changed 14 years ago by Ryan J Ollos

Status: newassigned

Replying to fernan:

Regarding the other problems I'm not proficient in Python so I can't offer much help, but I can certainly help debug reposearch ...

I'll take a look at the patch short, and will definitely take you up on that offer to do some testing and debugging!

comment:7 Changed 14 years ago by Ryan J Ollos

Thanks for this hint, it helped me with #8266 as well.

comment:8 Changed 14 years ago by Ryan J Ollos

(In [9663]) Use \S in the regular expression that extracts words. \S will match any non-whitespace character, whereas \w only matches alphanumeric characters and the underscore. Refs #5938.

comment:10 Changed 13 years ago by Ryan J Ollos

Status: assignednew

comment:11 Changed 10 years ago by Ryan J Ollos

Owner: changed from Ryan J Ollos to anonymous

comment:12 Changed 11 months ago by Ryan J Ollos

Owner: changed from anonymous to Alex Smith
Status: newassigned

comment:13 Changed 11 months ago by Ryan J Ollos

Status: assignednew

Modify Ticket

Change Properties
Set your email in Preferences
Action
as new The owner will remain Alex Smith.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.