Skip to content

Commit 6865784

Browse files
authored
rna-transcription approach: a few improvements (#4052)
* rna-transcription approach: a few improvements - Renamed `chr` to `char` in the snippets because `chr` is a built-in function and although shadowing it in this case may not be a problem, it is still a bad practice when it can be avoided. - The dictionary-join approach mentions list comprehensions, but instead it uses a generator expression. Replaced this in the explanation and expanded to give the list comprehension based implementation along with a brief comparison. - The overview mentions one approach is four times faster. In a brief comparison, it varies from 2.5x for a very short string and up to 60x faster for a 10^6 long one. Probably not worth going into the details, but 4x is just innacurate. * convert code snippets to single quotes for consistency * several updates following discussions - Replaced `char` with `nucleotide` as this is terminology from the domain. - Rephrased a "see also" link to be more screen reader friendly. - A note about the exercise not requiring tests for invalid characters is preset in one of the approaches. Copied it over to the other approach, for uniformity. - Rephrased mention about performance and speedup. - Replaced mention of ASCII with Unicode adding a brief explanation and links. * move note regarding testing for erroneous inputs to `introduction.md` ... because it applies to the exercise in general, not a particular approach. Re-applying missed commits from prior cherry-pick. * Re-applied the commits from below via cherry-pick. convert code snippets to single quotes for consistency
1 parent d069b5d commit 6865784

File tree

5 files changed

+51
-22
lines changed

5 files changed

+51
-22
lines changed
Lines changed: 27 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
# dictionary look-up with `join`
22

33
```python
4-
LOOKUP = {"G": "C", "C": "G", "T": "A", "A": "U"}
4+
LOOKUP = {'G': 'C', 'C': 'G', 'T': 'A', 'A': 'U'}
55

66

77
def to_rna(dna_strand):
8-
return ''.join(LOOKUP[chr] for chr in dna_strand)
8+
return ''.join(LOOKUP[nucleotide] for nucleotide in dna_strand)
99

1010
```
1111

@@ -16,15 +16,37 @@ but the `LOOKUP` dictionary is defined with all uppercase letters, which is the
1616
It indicates that the value is not intended to be changed.
1717

1818
In the `to_rna()` function, the [`join()`][join] method is called on an empty string,
19-
and is passed the list created from a [list comprehension][list-comprehension].
19+
and is passed the list created from a [generator expression][generator-expression].
2020

21-
The list comprehension iterates each character in the input,
21+
The generator expression iterates each character in the input,
2222
looks up the DNA character in the look-up dictionary, and outputs its matching RNA character as an element in the list.
2323

24-
The `join()` method collects the list of RNA characters back into a string.
24+
The `join()` method collects the RNA characters back into a string.
2525
Since an empty string is the separator for the `join()`, there are no spaces between the RNA characters in the string.
2626

27+
A generator expression is similar to a [list comprehension][list-comprehension], but instead of creating a list, it returns a generator, and iterating that generator yields the elements on the fly.
28+
29+
A variant that uses a list comprehension is almost identical, but note the additional square brackets inside the `join()`:
30+
31+
```python
32+
LOOKUP = {'G': 'C', 'C': 'G', 'T': 'A', 'A': 'U'}
33+
34+
def to_rna(dna_strand):
35+
return ''.join([LOOKUP[nucleotide] for nucleotide in dna_strand])
36+
```
37+
38+
39+
For a relatively small number of elements, using lists is fine and may be faster, but as the number of elements increases, the memory consumption increases and performance decreases.
40+
You can read more about [when to choose generators over list comprehensions][list-comprehension-choose-generator-expression] to dig deeper into the topic.
41+
42+
43+
~~~~exercism/note
44+
As of this writing, no invalid DNA characters are in the argument to `to_rna()`, so there is no error handling required for invalid input.
45+
~~~~
46+
2747
[dictionaries]: https://docs.python.org/3/tutorial/datastructures.html?#dictionaries
2848
[const]: https://realpython.com/python-constants/
2949
[join]: https://docs.python.org/3/library/stdtypes.html?#str.join
3050
[list-comprehension]: https://realpython.com/list-comprehension-python/#using-list-comprehensions
51+
[list-comprehension-choose-generator-expression]: https://realpython.com/list-comprehension-python/#choose-generators-for-large-datasets
52+
[generator-expression]: https://realpython.com/introduction-to-python-generators/#building-generators-with-generator-expressions
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
LOOKUP = {"G": "C", "C": "G", "T": "A", "A": "U"}
1+
LOOKUP = {'G': 'C', 'C': 'G', 'T': 'A', 'A': 'U'}
22

33

44
def to_rna(dna_strand):
5-
return ''.join(LOOKUP[chr] for chr in dna_strand)
5+
return ''.join(LOOKUP[nucleotide] for nucleotide in dna_strand)

exercises/practice/rna-transcription/.approaches/introduction.md

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,13 +7,13 @@ Another approach is to do a dictionary lookup on each character and join the res
77
## General guidance
88

99
Whichever approach is used needs to return the RNA complement for each DNA value.
10-
The `translate()` method with `maketrans()` transcribes using the [ASCII][ASCII] values of the characters.
10+
The `translate()` method with `maketrans()` transcribes using the [Unicode][Unicode] code points of the characters.
1111
Using a dictionary look-up with `join()` transcribes using the string values of the characters.
1212

1313
## Approach: `translate()` with `maketrans()`
1414

1515
```python
16-
LOOKUP = str.maketrans("GCTA", "CGAU")
16+
LOOKUP = str.maketrans('GCTA', 'CGAU')
1717

1818

1919
def to_rna(dna_strand):
@@ -26,20 +26,26 @@ For more information, check the [`translate()` with `maketrans()` approach][appr
2626
## Approach: dictionary look-up with `join()`
2727

2828
```python
29-
LOOKUP = {"G": "C", "C": "G", "T": "A", "A": "U"}
29+
LOOKUP = {'G': 'C', 'C': 'G', 'T': 'A', 'A': 'U'}
3030

3131

3232
def to_rna(dna_strand):
33-
return ''.join(LOOKUP[chr] for chr in dna_strand)
33+
return ''.join(LOOKUP[nucleotide] for nucleotide in dna_strand)
3434

3535
```
3636

3737
For more information, check the [dictionary look-up with `join()` approach][approach-dictionary-join].
3838

3939
## Which approach to use?
4040

41-
The `translate()` with `maketrans()` approach benchmarked over four times faster than the dictionary look-up with `join()` approach.
41+
If performance matters, consider using the [`translate()` with `maketrans()` approach][approach-translate-maketrans].
42+
How an implementation behaves in terms of performance may depend on the actual data being processed, on hardware, and other factors.
4243

43-
[ASCII]: https://www.asciitable.com/
44+
45+
~~~~exercism/note
46+
As of this writing, no invalid DNA characters are in the argument to `to_rna()`, so there is no error handling required for invalid input.
47+
~~~~
48+
49+
[Unicode]: https://en.wikipedia.org/wiki/Unicode
4450
[approach-translate-maketrans]: https://exercism.org/tracks/python/exercises/rna-transcription/approaches/translate-maketrans
4551
[approach-dictionary-join]: https://exercism.org/tracks/python/exercises/rna-transcription/approaches/dictionary-join
Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# `translate()` with `maketrans()`
22

33
```python
4-
LOOKUP = str.maketrans("GCTA", "CGAU")
4+
LOOKUP = str.maketrans('GCTA', 'CGAU')
55

66

77
def to_rna(dna_strand):
@@ -15,20 +15,21 @@ Python doesn't _enforce_ having real constant values,
1515
but the `LOOKUP` translation table is defined with all uppercase letters, which is the naming convention for a Python [constant][const].
1616
It indicates that the value is not intended to be changed.
1717

18-
The translation table that is created uses the [ASCII][ASCII] values (also called the ordinal values) for each letter in the two strings.
19-
The ASCII value for "G" in the first string is the key for the ASCII value of "C" in the second string, and so on.
18+
The translation table that is created uses the [Unicode][Unicode] _code points_ (sometimes called the ordinal values) for each letter in the two strings.
19+
As Unicode was designed to be backwards compatible with [ASCII][ASCII] and because the exercise uses Latin letters, the code points in the translation table can be interpreted as ASCII.
20+
However, the functions can deal with any Unicode character.
21+
You can learn more by reading about [strings and their representation in the Exercism Python syllabus][concept-string].
22+
23+
The Unicode value for "G" in the first string is the key for the Unicode value of "C" in the second string, and so on.
2024

2125
In the `to_rna()` function, the [`translate()`][translate] method is called on the input,
2226
and is passed the translation table.
2327
The output of `translate()` is a string where all of the input DNA characters have been replaced by their RNA complement in the translation table.
2428

25-
26-
~~~~exercism/note
27-
As of this writing, no invalid DNA characters are in the argument to `to_rna()`, so there is no error handling required for invalid input.
28-
~~~~
29-
3029
[dictionaries]: https://docs.python.org/3/tutorial/datastructures.html?#dictionaries
3130
[maketrans]: https://docs.python.org/3/library/stdtypes.html?#str.maketrans
3231
[const]: https://realpython.com/python-constants/
3332
[translate]: https://docs.python.org/3/library/stdtypes.html?#str.translate
3433
[ASCII]: https://www.asciitable.com/
34+
[Unicode]: https://en.wikipedia.org/wiki/Unicode
35+
[concept-strings]: https://exercism.org/tracks/python/concepts/strings

exercises/practice/rna-transcription/.approaches/translate-maketrans/snippet.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
LOOKUP = str.maketrans("GCTA", "CGAU")
1+
LOOKUP = str.maketrans('GCTA', 'CGAU')
22

33

44
def to_rna(dna_strand):

0 commit comments

Comments
 (0)